The modern researcher’s toolbox

We all know that Moore’s law pushed technology a long way since ENIAC. The way we do the research changed unquestionably, but so did the way we write up our findings. I have the feeling that this latter part although, somewhat concentrates about the fact that library-roaming has been replaced with google-scholaring. Especially if you think about the last 15-20 years. The way we complie the articles hasn’t really changed… Create a Word doc – if you’re techy, a LaTeX file – and start right off. You reach a point, when you think it is reasonable to share it with the rest of the co-authors and off goes the email, attachment, track changes, and then it’s back to you. Sure, there are some collaborative initiatives – with online editing even and some of which have been around for quite a while, e.g. Google Docs, Micsoroft OneDrive – or some advancements in the way we share files such as Dropbox. But it’s not only about the writing. The figures are not just Excel or Matlab anymore. They are Python. Or R. And research requires an increasingly larger amount of code. Then, there is the formatting… You submit, get rejected, reformat and submit again. And repeat until forever. Much like:

This got me thinking that there must be some way of streamlining this process. And there is. I just had to put together the right tools. Meet my integrated research environment (much like an integrated development environment, used by coders):


Tools for the integrated research environment of the modern researcher

I’ve stumbled upon this marvellous tool called Authorea, to find that it connects to everything that I was doing before and it ticks the box which was the most annoying for me: automatic reformatting for journals. It also made possible a move that I have been wanting to take for a long time – migrating to writing research in LaTeX from Word. And, in order to smooth that transition, you can first try Markdown.

So here are the steps to take towards writing a research paper, the modern way:

  1. Come up with your amazing new idea
  2. Search for data online, then fire up a Jupyter notebook (formerly Ipython) running Python (or R)
  3. Get and massage your data with pandas
  4. Create some nice plots with matplotlib
  5. Save the data into JSON format, then fire up your favorite text editor
  6. Pull the data from the JSON files and create an awesome interactive visualization with D3.js
  7. Create project website in HTML5 and CSS3, host it on GitHub
  8. Create a new Authorea article, set it up to push automatically to a Github repository
  9. Invite your collaborators to the article – and work simultaneously on the text with git versioning control
  10. Find your inner muse and write up that article in HTML, Markdown or LaTeX
  11. Put in some fancy equations in LaTeX
  12. Paste those graphs you created with matplotlib, then put a link to your source code – this way your readers can actually fire up a live Jupyter notebook on the Authorea server and even play with your code
  13. If you want to step up your game, go ahead and paste that interactive visualization you created in D3.js from the website hosted on GitHub
  14. Paste those references directly from CrossRef, without the need of a citation manager
  15. Chat with your co-athors, review and finalize your aticle – in the browser
  16. Go online again and search for the most awesome journal of your choice
  17. Export your article from Authorea in the formatting requirements of your chosen journal – just a click
  18. Get some sleep man!
  19. Repeat steps 15-18 until accepted🙂

And the best part? All of the above are open source, free tols.

Good luck!

A visual exploratorium of refugee flows over the world using dynamic chord diagrams

Click here to look at the data visualization only.
A localized Hungarian version of this post also exists.
PREVIOUS: Refugee dynamics - what does the data say?

This post is an update on the Refugee dynamics – what does the data say? post – a data visualization that I have developed for World Refugee Day 2015. Now I have written a new data parser to directly access the UNHCR database rather than taking the afterreported data from UNDATA. This allowed me to extend the time horizon to 1951-2014 to include the latest available numbers and now the web app also includes internally displaced persons (IDPs).

I think now it has become quite a powerful tool for analyzing the refugee flows of the past half century! Other modifications I have made are that now I display the total number of displayed refugees in the center of the chord diagrams, I have included a filter to select whether we want to display the IDPs or not and now, on top of being able to set a floor for the displayed flows, you can also set a ceiling – making it possible to visualize, let’s say only refugee flows that fall between 1000 and 2000 people.

So here you go, play & enjoy!


The visual exploratorium of refugee flows over the world using dynamic chord diagrams 1.1

Briefly, the new insights are:

  • The world’s refugees total 51.6 million, when including IDPs, 12.7 without.
  • In 2014, Syria is the largest source of refugees, by far: 11.5 million people in total, out of which 7.8 million IDPs.
  • Colombia, Iraq and DRC have a huge number of IDPs (6, 3.8 and 3.3 million, respectively) – not seen on the previous version!
  • Without IDPs, Jordan is the largest host country, hosting 2.7 million refugees, 600 thousand Syrians and 2.1 million Palestinians.
  • Without UNRWA numbers (making up the bulk of the Palestinian refugees – so by filtering out Palestine from the visualization), the largest host, for the first time in history is Turkey, with 1.55 million Syrian refugees and barely surpassing Pakistan with 1.51 million Afghans.
  • Taking into account IDPs further dwarfes the flows targeted towards developed countries. Decreasing the filters by 100 to 1000 times and displaying only the flows between 2000 and 10000 refugees, the European and American flows are highlighted.

Refugee flows between 2000 and 10000 people in the year 2014

  • We can see that the developed world welcomes refugees almost equally from a number of source countries, which a different pattern from those of the developing countries – many next to the conflict-affected areas – who mainly host refugees from one or two countries – but the scale is about 10-100 times smaller

Keep exploring the web-app and let me know in the comments below if you discover something interesting! Remember, you can now go back all the way to 1951 – but data gets a bit patchier before the 70s.

This post describes an update to a web-app that I have created that visualizes refugee flows over the world using dynamic chord diagrams. You can find the data sources and methodology in the first post. This is the new IPython parser and this is the new outcome: the visual exploratorium of refugee flows over the world using dynamic chord diagrams 1.1. As always, made with d3.js. If you liked this post or have any questions or thoughts, Like, Share, Comment, and Subscribe! If you think my work is cool and you would like to support it, please consider a small Donation.

Refugee dynamics – what does the data say?

Click here if you prefer to read this post on (11 minute read).
Click here to look at the data visualization only.
UPDATE: A visual exploratorium of refugee flows over the world using dynamic chord diagrams

Let me present you the visual exploratorium of refugee flows over the world using dynamic chord diagrams! Perhaps one of the hottest topics today is that of refugees and immigrants. There are conflicting views about stately attitudes towards unwanted immigration and the debates end nasty. Spanning from Europe through the Middle East, Central Africa and all the way to Eastern Asia, refugees are topic of constant uproar – and unavoidably a handy political (populist) tool. But what is the reality behind the movement of the world’s refugees? Without taking any political stance, let us see, what does the data say about refugee dynamics!


The visual exploratorium of refugee flows over the world using dynamic chord diagrams

While it is known that natural disasters and wars (especially civil wars – read my previous post about insurgent dynamics) are the main causes of pushing somebody into jumping the border, these people-flows can also act as a catalyst for igniting other conflicts in the receiving regions. The linkages between the Second Congo War and the  Rwandan and Burundian civil wars are prime examples for cross-border conflict overspill. Therefore, most countries are reluctant to receive refugees in large numbers, nonetheless in culturally integrated regions, the process is unavoidable. However, when talking about the receiving countries, you might think that the wealthy, developed nations take in the most refugees. You couldn’t be more wrong… It is the developing nations, oftentimes those with the smallest GDPs in the world taking in the largest number of people. In fact, in 2013, there were no developed countries among the top 25 refugee receiving countries. However, based on recent cryses in the Andaman sea, and repeatedly over the Mediterranean, Western refugee intake is expected to soar… a bit. But we will investigate how really off the scale the mainstream media is, when talking about refugee and illegal immigrant invasion in Europe for example.

Read More

Try something new. Everyday. + SZÉKELYDATA: Data Blog Updates

SZÉKELYDATA – my data blog in Hungarian about Székelyland, Transylvania and their surroundings

Recently I have published several posts on my Hungarian language data blog, SZÉKELYDATA. We have been experiencing some performance issues with the host, so please be patient and try again in a while if the page does not load at first.

  • About a month ago, I posted about the Gender Gap and Women’s Employment Inequality in Székelyland, Transylvania and Romania, using a GapMinder-style data visualization. Data from the last quarter century is examined and it is found that while the Gender Gap has significantly reduced and even reversed over the last 25 years, Employment Inequality mostly stayed the same. Max Ghalka provided the inspiration for this post.
  • Last week, my post about the historical pilgrimage to Csíksomlyó (HU, EN) went viral and received more than 20 000 views and 2200 shares. This is not a data visualization, but a crowd size estimation using various techniques. I used this gigapan and this timelapse for the analysis. It turns out that the historical pilgrimage gathers 250 000 Hungarians and Székelys year to year in peace, from all corners of the world.
  • And very recently I wrote about the dynamics of the GPAs and results on high-school-level academic competitions of the students on the Márton Áron High School in Csíkszereda, Székelyland, Transylvania. Using interactive data analytics and thought crumbs from a modern teaching theory (Constructive Alignment) we investigate the possible reason for the discrepancy in rankings between the best GPA students and the best competitors. My graduating brother was the main inspiration for this post : )

Try something new. Everyday

The big news is that recently I have set-up a donation mechanism over PayPal. You may know that both this blog Try something new. Everyday. and SZÉKELYDATA are edited by me in my free time on a voluntary basis. If you like what you are seeing, please like and share my posts to increase visibility and subscribe to (Follow in WordPress) my blog. You can also express your support by considering a small one-time donation or you can even set it up on a recurring monthly/yearly basis. I wholeheartedly thank all of you in advance and I pledge to keep up posting data visualizations, analyses, mining and manipulation methods and infographics to your delight!

Here is the donation button for Try something new. Everyday

It is not necessary for you to have a PayPal account, you can use your credit card directly.

Donate for more datawizardry!

And here is the one for SZÉKELYDATA

Both are linked to the same wallet actually, they just use different currencies (USD vs EUR).

Donate for more datawizardry!