People tell me that they would like to make a visualizations in D3.js. And it is too complicated. The learning curve is too steep. Even crafting C3.js or Vega simplified D3.js code seems too complicated. In this post, we will examine the road the data takes from the database or website until the drawing canvas – that is your computer screen.
In my previous post I explained how to load data with D3.js from the Quandl database aggregator directly into NVD3, an easy-to-use graphing library for D3.js. If you you want to visualize just one set of data and don’t worry too much about customization, this is a valid option. However, if you want add your own touch, combine or extend the data with additional fields, or add your own comments, usually you would have to do some additional data processing.
First we need to settle some terms, concerning data formats:
- CSV – Comma Separated Values: Think of CSV as a simplified table – this is the (most compact) format usually the data exists in, after we compile it, downloading it from a database or making a direct call to it. Every bit of data today today can be converted to and from CSV.
- JSON – JavaScript Object Notation: is the data format that JavaScript – your browser’s code processor – understands and can process fast.
- SVG – Scalable Vector Graphics: A vector-based image format for the HTML language, for display in web browsers. CANVAS is the non-vector alternative to SVG.
Then, we need to cover some programming tools:
- Python is a high-level and relatively easy to learn programming language.
- pandas is a data manipulation library, available for a number of programming languages, including Python.
- IPython is an easy editor for writing Python code and playing around with pandas data, using only a web browser.
- If you are new to all of the above 3, try installing the Anaconda package, which contains all of them.
- HTML and CSS are web standard languages, responsible for how do websites look like.
- JavaScript is a dynamic programming language, responsible for what do websites do.
- D3.js, a powerful JavaScript library by Mike Bostock, mostly used to create interactive data visualizations.
Knowing all these, we can define the general process I usually go through, when creating a data visualization:
- Look up the data you want to visualize on the internet. Try to use established databases, credible sources or data aggregators. (Check my earlier post about visualizing data with Quandl and NVD3)
- Download and save the data you are interested in in CSV or JSON format.
- If you are an advanced user, this step is not necessary, you can just look for an API on the target website and a possibility to download the data in CSV or JSON format, then get the downlaod URL.
- If option 2 does not work, try see if the data displayed on the page is simply embedded in an HTML table (such as on Wikipedia).
- Fire up IPython, load pandas.
- Load data into a pandas dataframe.
- Examine your data and decide on the visualization type you would like to construct. Try to make your data visualization beautiful. To decide on the format, go to the the D3.js Gallery and open your chosen format. Then go to the source code and check if that particular visualization uses a load function (d3.json or d3.csv) or a locally defined javascript variable as data source. Then check and record the data input format required by your visualization. For more complex visualizations, usually this is a JSON file with a number of fields.
- Massage the data in pandas into the required format. Then, using Python dictionaries, save the data into a JSON file.
- Download your chosen D3.js example and fire up a local web server using SmipleHTTP to be able to edit the visualization locally.
- Due to security reasons, browsers do not allow you to use XMLHttpRequests locally. Loading a JSON file from your local disk falls into this category. Therefore you need to start up a local web server. Python has already this functionality built in.
- Open a terminal or command prompt, navigate to the folder where your running Ipython notebook is located and run the following:
C:\Anaconda\python.exe -m SimpleHTTPServer 8898
or, if have Python 3:
C:\Anaconda\python.exe -m http.server 8898
- Then, instead of opening the html file that you were editing, e.g. myfile.html directly in the browser, rename it to index.html and place it inside the same folder as your local server is running and navigate your browser to http://localhost:8898
- Now you can modify the D3.js example to work with your data.
- Add any other modifications in a text editor by editing the HTML, CSS and D3.js codes.
- Boast about your new visualization skills : )
UPDATE: Top comment on this post on reddit : )