The modern researcher’s toolbox

We all know that Moore’s law pushed technology a long way since ENIAC. The way we do the research changed unquestionably, but so did the way we write up our findings. I have the feeling that this latter part although, somewhat concentrates about the fact that library-roaming has been replaced with google-scholaring. Especially if you think about the last 15-20 years. The way we complie the articles hasn’t really changed… Create a Word doc – if you’re techy, a LaTeX file – and start right off. You reach a point, when you think it is reasonable to share it with the rest of the co-authors and off goes the email, attachment, track changes, and then it’s back to you. Sure, there are some collaborative initiatives – with online editing even and some of which have been around for quite a while, e.g. Google Docs, Micsoroft OneDrive – or some advancements in the way we share files such as Dropbox. But it’s not only about the writing. The figures are not just Excel or Matlab anymore. They are Python. Or R. And research requires an increasingly larger amount of code. Then, there is the formatting… You submit, get rejected, reformat and submit again. And repeat until forever. Much like:

This got me thinking that there must be some way of streamlining this process. And there is. I just had to put together the right tools. Meet my integrated research environment (much like an integrated development environment, used by coders):


Tools for the integrated research environment of the modern researcher

I’ve stumbled upon this marvellous tool called Authorea, to find that it connects to everything that I was doing before and it ticks the box which was the most annoying for me: automatic reformatting for journals. It also made possible a move that I have been wanting to take for a long time – migrating to writing research in LaTeX from Word. And, in order to smooth that transition, you can first try Markdown.

So here are the steps to take towards writing a research paper, the modern way:

  1. Come up with your amazing new idea
  2. Search for data online, then fire up a Jupyter notebook (formerly Ipython) running Python (or R)
  3. Get and massage your data with pandas
  4. Create some nice plots with matplotlib
  5. Save the data into JSON format, then fire up your favorite text editor
  6. Pull the data from the JSON files and create an awesome interactive visualization with D3.js
  7. Create project website in HTML5 and CSS3, host it on GitHub
  8. Create a new Authorea article, set it up to push automatically to a Github repository
  9. Invite your collaborators to the article – and work simultaneously on the text with git versioning control
  10. Find your inner muse and write up that article in HTML, Markdown or LaTeX
  11. Put in some fancy equations in LaTeX
  12. Paste those graphs you created with matplotlib, then put a link to your source code – this way your readers can actually fire up a live Jupyter notebook on the Authorea server and even play with your code
  13. If you want to step up your game, go ahead and paste that interactive visualization you created in D3.js from the website hosted on GitHub
  14. Paste those references directly from CrossRef, without the need of a citation manager
  15. Chat with your co-athors, review and finalize your aticle – in the browser
  16. Go online again and search for the most awesome journal of your choice
  17. Export your article from Authorea in the formatting requirements of your chosen journal – just a click
  18. Get some sleep man!
  19. Repeat steps 15-18 until accepted 🙂

And the best part? All of the above are open source, free tols.

Good luck!

A visual exploratorium of refugee flows over the world using dynamic chord diagrams

Click here to look at the data visualization only.
A localized Hungarian version of this post also exists.
PREVIOUS: Refugee dynamics - what does the data say?

This post is an update on the Refugee dynamics – what does the data say? post – a data visualization that I have developed for World Refugee Day 2015. Now I have written a new data parser to directly access the UNHCR database rather than taking the afterreported data from UNDATA. This allowed me to extend the time horizon to 1951-2014 to include the latest available numbers and now the web app also includes internally displaced persons (IDPs).

I think now it has become quite a powerful tool for analyzing the refugee flows of the past half century! Other modifications I have made are that now I display the total number of displayed refugees in the center of the chord diagrams, I have included a filter to select whether we want to display the IDPs or not and now, on top of being able to set a floor for the displayed flows, you can also set a ceiling – making it possible to visualize, let’s say only refugee flows that fall between 1000 and 2000 people.

So here you go, play & enjoy!


The visual exploratorium of refugee flows over the world using dynamic chord diagrams 1.1

Briefly, the new insights are:

  • The world’s refugees total 51.6 million, when including IDPs, 12.7 without.
  • In 2014, Syria is the largest source of refugees, by far: 11.5 million people in total, out of which 7.8 million IDPs.
  • Colombia, Iraq and DRC have a huge number of IDPs (6, 3.8 and 3.3 million, respectively) – not seen on the previous version!
  • Without IDPs, Jordan is the largest host country, hosting 2.7 million refugees, 600 thousand Syrians and 2.1 million Palestinians.
  • Without UNRWA numbers (making up the bulk of the Palestinian refugees – so by filtering out Palestine from the visualization), the largest host, for the first time in history is Turkey, with 1.55 million Syrian refugees and barely surpassing Pakistan with 1.51 million Afghans.
  • Taking into account IDPs further dwarfes the flows targeted towards developed countries. Decreasing the filters by 100 to 1000 times and displaying only the flows between 2000 and 10000 refugees, the European and American flows are highlighted.

Refugee flows between 2000 and 10000 people in the year 2014

  • We can see that the developed world welcomes refugees almost equally from a number of source countries, which a different pattern from those of the developing countries – many next to the conflict-affected areas – who mainly host refugees from one or two countries – but the scale is about 10-100 times smaller

Keep exploring the web-app and let me know in the comments below if you discover something interesting! Remember, you can now go back all the way to 1951 – but data gets a bit patchier before the 70s.

This post describes an update to a web-app that I have created that visualizes refugee flows over the world using dynamic chord diagrams. You can find the data sources and methodology in the first post. This is the new IPython parser and this is the new outcome: the visual exploratorium of refugee flows over the world using dynamic chord diagrams 1.1. As always, made with d3.js. If you liked this post or have any questions or thoughts, Like, Share, Comment, and Subscribe! If you think my work is cool and you would like to support it, please consider a small Donation.

Refugee dynamics – what does the data say?

Click here if you prefer to read this post on (11 minute read).
Click here to look at the data visualization only.
UPDATE: A visual exploratorium of refugee flows over the world using dynamic chord diagrams

Let me present you the visual exploratorium of refugee flows over the world using dynamic chord diagrams! Perhaps one of the hottest topics today is that of refugees and immigrants. There are conflicting views about stately attitudes towards unwanted immigration and the debates end nasty. Spanning from Europe through the Middle East, Central Africa and all the way to Eastern Asia, refugees are topic of constant uproar – and unavoidably a handy political (populist) tool. But what is the reality behind the movement of the world’s refugees? Without taking any political stance, let us see, what does the data say about refugee dynamics!


The visual exploratorium of refugee flows over the world using dynamic chord diagrams

While it is known that natural disasters and wars (especially civil wars – read my previous post about insurgent dynamics) are the main causes of pushing somebody into jumping the border, these people-flows can also act as a catalyst for igniting other conflicts in the receiving regions. The linkages between the Second Congo War and the  Rwandan and Burundian civil wars are prime examples for cross-border conflict overspill. Therefore, most countries are reluctant to receive refugees in large numbers, nonetheless in culturally integrated regions, the process is unavoidable. However, when talking about the receiving countries, you might think that the wealthy, developed nations take in the most refugees. You couldn’t be more wrong… It is the developing nations, oftentimes those with the smallest GDPs in the world taking in the largest number of people. In fact, in 2013, there were no developed countries among the top 25 refugee receiving countries. However, based on recent cryses in the Andaman sea, and repeatedly over the Mediterranean, Western refugee intake is expected to soar… a bit. But we will investigate how really off the scale the mainstream media is, when talking about refugee and illegal immigrant invasion in Europe for example.

Read More

How is a D3.js visualization made? – the road from CSV to SVG

People tell me that they would like to make a visualizations in D3.js. And it is too complicated. The learning curve is too steep. Even crafting C3.js or Vega simplified D3.js code seems too complicated. In this post, we will examine the road the data takes from the database or website until the drawing canvas – that is your computer screen.

In my previous post I explained how to load data with D3.js from the Quandl database aggregator directly into NVD3, an easy-to-use graphing library for D3.js. If you you want to visualize just one set of data and don’t worry too much about customization, this is a valid option. However, if you want add your own touch, combine or extend the data with additional fields, or add your own comments, usually you would have to do some additional data processing.

Read More

Colorful Development: Cartagena DataFest 2015

This post has been prepared as part of a submission to the Cartagena DataFest 2015.
Click here to look at the data visualization only on
PREVIOUS: Colorful Development: RGB-coded Multidimensional HDI 
PREVIOUS: Colorful Development: Dynamic Graphs


This post is an update on the Colorful Development: Dynamic Graphs post – a data visualization that has evolved into a full-fledged web-app: an updated, prettier version with a lot of user interface enhancements that now includes the inequality and gender-adjusted human development indices as well.

Colorful Development: Cartagena DataFest 2015

Colorful Development: Cartagena DataFest 2015 – click for interactive

In this post we will look at the evolution of the inequality between the three components (Health, Education, Income) of the Human Development Index (HDI), and the Inequality Adjusted Human Development Index (IHDI) using a web-app I have constructed. To make for a comfortable reading, but also for a strong basis, first we will concentrate on the user interface. I have prepared this as part of a submission to the Cartagena DataFest 2015 data visualization competition. As before, for this, we will need to use tripolar RGBHDI plots (I also call it a colorwheel), a rather peculiar 3 dimensional coordinate system defined in this previous post , so make sure you take a look first, as well as the update discussing converting the RGB plot into a dynamic graph with an adjacent world-map. Read More