June the 28th, 2017 - Thomas Roca, PhD - Agence Française de Développement
United Nations System Staff College, Big data & SGDs training, Nairobi

Introduction to datavisualization

"A data scientist is someone who is better at statistics than any software engineer and better
at software engineering than any statistician."


Stay in touch! Twitter: @Thomas Roca || LinkedIn || Lecture's Github Folder

Objectives of the Workshop

  • Be familiar with the data ecosystem;

  • Put your hand on tools (soft, programming, code/web) in order to:

    • not to be affraid of data & code;
    • Be autonomous in visual communication;

At the end of the day you will :

  • Have a basic knoweldege of HTML, Javascript and API;
  • ba able to produce maps and dashboards;

I. Basic Introduction to data representation

What is datavisualization ?

  • It must display data...
  • Dataviz is more about making complexe information accessible and easy to understand
  • Infographic is more about story telling (marketing)

Why visualizing data ?


Numbers are not enough informative, they can be a bit sneaky...


Variation of the Ascombe' quartet in the 21st century..
source: The Datasaurus Dozen

... A bit of dataviz history

Circa 1810, Charles Joseph Minard Carte Figurative of Napoleon’s 1812 campaign

Circa 1850, Florence Nightingale collect health statistics

Circa 1890, WEB Du Bois African American activist

A bit later, with the internet...

... interactivity and real time data come in


In [43]:
HTML('''<iframe src="http://www.oecdbetterlifeindex.org/#/11111111111" scrolling="no" frameborder="0" width=100%" height="675"></iframe>''')
Out[43]:

II. Dataviz for all: packages and software..

Funky stuff... like XKCD

In [33]:
from IPython.display import Image
Image('http://jakevdp.github.com/figures/xkcd_version.png')
Out[33]:

Using javascript library within Python

... The example of d3.js

In [44]:
from string import Template
from IPython.display import HTML
from nvd3 import pieChart
type = 'pieChart'
chart = pieChart(name=type, color_category='category20c', height=450, width=450)
xdata = ["Orange", "Banana", "Pear", "Kiwi", "Apple", "Strawberry", "Pineapple"]
ydata = [3, 4, 0, 1, 5, 7, 3]
extra_serie = {"tooltip": {"y_start": "", "y_end": " calories"}}
chart.add_serie(y=ydata, x=xdata, extra=extra_serie)
chart.buildcontent() 
d3script='<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/nvd3/1.7.1/nv.d3.min.js"></script>'
f = open("chart_d3.html",'w')
f.write(d3script+chart.htmlcontent)
f.close()
HTML('<iframe src="http://stats4dev.com/prez/chart_d3.html" scrolling="no" frameborder="0" width=100%" height="500px"></iframe>')
Out[44]:

GIS with Microsoft Office 365' PowerBI & Carto

Example n°1: Mapping crisis event with ALCED dataset using PowerBI

Example n°2: Mapping subnational Population in Mali with data from the web: Region in Mali using Carto

2. using Carto

Mapping subnational Population in Mali with data from the web Region in Mali and carto

  • Step: 1. Download the data, 2. Clean it a bit
  • Rename column with region, in this case "Admin. Regions"

Map Conflict in Sahel Region with ACLED dataset

In [45]:
HTML('<iframe width="100%" height="520" frameborder="0" src="https://rocathomas.carto.com/builder/a0c9d70c-5767-11e7-948d-0ef7f98ade21/embed" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>')
Out[45]:

Dealing with errors, sometimes... happens

  • search for the shapefile of the admin level you want, in this case admin1
  • import the shapefile (.zip file) and look for the region missing in your dataset
  • Copy the geometry information missing to correct the bug

Practical: Create a heatmap with Carto [20min]

  • map Kenya Health sites using CSV file from HDX plateform;
  • Map Cameroon population in 2015 using the data in the Github folder (CMR sub)

II. Basic instroduction to the web languages: HTML, CSS

Let's code a bit now.. HTML, CSS, Javascript

A. What is HTML?

  • The language of the web
  • It sands for HyperText Markup Language(Markup language like LaTeX or Markdown)
  • It can be understood as a standardized form of XML but:
    • XML was designed to carry data - with focus on what data is
    • HTML was designed to display data - with focus on how data looks
    • XML tags are not predefined like HTML tags are
<h1> Basic introduction to HTML, CSS</h1>             
<br>
<h2>Hello world ! </h2>
<br>
<span style="color:red; font-size:36px; font-family:Magneto ">
Hello world! with style</span>


HTML & tags

- open a tag: <> close </>
- Titles: <h1>Big title</h1>, <h2>Smaller title</h2>
- Bold: <b> Bold text</b>
- italic: <i>italic text<i>
- Line break <br>
- Horizontal rule <hr>
- link <a href="www.link.com">Link</a>
- image <img src="imageadress.png>
- iframe: <iframe scr="http://www.afd.fr"></iframe>

etc.

HTML & tables

<table>
<tr>
<th> Titre 1</th><th>Titre 2</th>
</tr><tr>
<td> Dataviz 1</td><td>Dataviz 2</td>
</tr>
</table>

HTML + CSS

Cascade Style Sheet: instruction to affect a style to the HTML.

<h1 style="color: blue; text-align: center; font-family: Calibri">Big title</h1>

It can be stored in a separate file for eg. style.css

h2 {
    color: white;
    text-align: center;
    font-family: Calibri;
}

When style instruction is separated from HTML pages, it can apply to many files and we can modify a complete website look changing just changing style instruction.

The console and the inspector

Let's see what is beyond: https://www.whitehouse.gov/
Let's read passwords.. https://extranet.afd.fr/dana-na/auth/url_default/welcome.cgi

B. Introduction to Scalable Vector Graphic

<h1>Hello SVG</h1>
<svg width="100" height="100">
   <circle cx="50" cy="50" r="40" stroke="rgb(0, 119, 181)" stroke-width="4" fill="blue" />
</svg> 

SVG : Vector versus pixel

In [46]:
HTML('''<iframe width="100%" height="300" src="http://jsfiddle.net/fxcjzufw/1/embedded/html,result/" allowfullscreen="allowfullscreen" frameborder="0"></iframe> ''')
Out[46]:

C. Dataviz: leveraging javascript libraries

Using Highcharts

Highcharts is a dataviz library based on JavaScript. The actual code is stored in an external javascript file (.js)
All you have to do is feed it with data, and choose parameters and display: title subtitle, colors, legend, tooltips etc.)

In [47]:
HTML('''
<iframe width="100%" height="470" src="http://jsfiddle.net/ThomasRoca/fps87ooa/embedded/result,js,html/" allowfullscreen="allowfullscreen" frameborder="0"></iframe>
''')
Out[47]:

You can also use a google doc to feed the representation with data, or API...

In [48]:
HTML('''<iframe width="100%" height="550" src="http://data.afd.fr/DataTools/carte_intervention/Carte_AFD.html" allowfullscreen="allowfullscreen" frameborder="0"></iframe>''')
Out[48]:

Carto using Leaflet

  • Leaflet is another javascript library to create Maps
  • same idea, a js file that contains most of the code and you give it the parameters

http://leafletjs.com/examples/quick-start/

In [49]:
HTML('''<iframe width="100%" height="420" src="http://leafletjs.com/examples/quick-start/example-popups.html" allowfullscreen="allowfullscreen" frameborder="0"></iframe>''')
Out[49]:

III. Data and programmming for the web

A tiny bit of code..

Basic introduction to Javascript

In [50]:
HTML('''
<iframe width="100%" height="300" src="http://jsfiddle.net/ThomasRoca/50snpv6r/embedded/" allowfullscreen="allowfullscreen" frameborder="0"></iframe>
''')
Out[50]:

B. When Data comes in : JSON, XLM, YML, etc.

1. The text format

JSON datasets are text data. It uses keys to organize the data and is readable by JavaScript. Here is an example of the data structure.

{
  "indicator_value": {
    "HDI": {
      "AFG": {
        "2009": "0.404",
        "2010": "0.426",
        "2011": "0.428",
        "2012": "0.443",
        "2013": "0.445"
      }
    }
  },
  "country_name": {
    "AFG": "Afghanistan"
  },
  "indicator_name": {
    "HDI": "HDI: Income index"
  }
}

How to retrieve informations stored in a JSON ?

  • to read it easily you can use online editor such as json editor
  • but the interest is to read itr with code:
  • in Javascript, Python, etc. asking json["indicator_value"]["HDI"]["AFG"]["2013"] will give you: 0.445
In [51]:
HTML('''
<iframe width="100%" height="300" src="http://jsfiddle.net/ThomasRoca/5f4jh80c/embedded/js,result/" allowfullscreen="allowfullscreen" frameborder="0"></iframe>
''')
Out[51]:

XML:

    <indicator_value>
        <HDI>
          <AFG>
            <2009>0.404</2009>
            <2010>0.426</2010>
            <2011>0.428</2011>
            <2012>0.443</2012>
            <2013>0.445</2013>
          </AFG>
    </HDI>
  </indicator_value>
  <country_name>
    <AFG>Afghanistan</AFG>
  </country_name>
  <indicator_name>
    <HDI>HDI: Human Development Index</HDI>
  </indicator_name>

YML

indicator_value:
    'HDI': {AFG: {'2009': '0.404', '2010': '0.426', '2011': '0.428', '2012': '0.443', '2013': '0.445'}}
country_name:
    AFG: Afghanistan
indicator_name:
    'HDI': 'HDI: Human Development Index'

With a viewer: https://codebeautify.org/xmlviewer http://convertjson.com/

2. programmatic way to access data: the API

What is an API (application programming interface) ?

Why using an API

  • Programmatic access = Automate
  • Real time, and always up to date
  • Provide differentiated (accreditation) acess to data

Basically, an API reads the url and parses it to query a database

ex: https://api.dataspace.fr/?dataset=WDI&indicator=NY.GDP.MKTP.KD.ZG&year=2001&country=FRA

... Then it is easy to parse the data to create a dataviz

A bit more complex..

Visualizing all the WDI Variables from the World Bank!

In [52]:
HTML('''<iframe width="100%" height="485" src="http://jsfiddle.net/ThomasRoca/1vpypyc9/embedded/result,js,html,css" allowfullscreen="allowfullscreen" scrolling:"no" frameborder="0"></iframe>''')
Out[52]:

A bit more complex.. Visualizing all the DHS Variables

http://data.afd.fr/DataTools/DHS/DHS+browser.html

In [53]:
HTML('<iframe width="100%" height="485" src="http://data.afd.fr/DataTools/DHS/DHS_app.html" allowfullscreen="allowfullscreen" scrolling:"no" frameborder="0"></iframe>')
Out[53]:

Hands-on Session: the day you become a *Data Jedi*



Practical: Option A. Create a DataStory with Highcharts and Carto

  • use Data Story Template in the github repository, It can be about :
    • health;
    • Education;
    • agriculture;
    • sociaux economic conditions, etc.

You can use data sources such as the World Bank indicators, UN OCHA HDX platform, etc.



Pratical: Option B. Use world bank API & the DHS application

  • World bank API
    • GDP growth (annual %) of Kenya (WDI)
    • last 20 observation
      • using highcharts
  • DHS API
    • Create a heatmap using leaflet
    • display age specific literacy rate 15-19
    • subnational level in Kenya
    • for the last DHS available

Exercie 1: World Bank API [20min]

  • Create a timeline charts with highcharts
  • Displaying the last 20 years of evolution of GDP growth (GDP growth (annual %)) in Kenya (KEN)
  • using world bank WDI API
In [54]:
HTML(''' <iframe src="http://jsfiddle.net/ThomasRoca/0eata2p0/embedded/result,js,html/" allowfullscreen="allowfullscreen" frameborder="0"width="100%" height="465"></iframe>''')
Out[54]:

Exercie 2: DHS API [30min]

  • Create a heatmap using leaflet
  • display age specific literacy rate 15-19
  • subnational level in Kenya
  • for the last DHS available
In [55]:
HTML(''' <iframe src="http://jsfiddle.net/ThomasRoca/069Lqfkz/embedded/result,js,html/" allowfullscreen="allowfullscreen" frameborder="0"width="100%" height="465"></iframe>''')
Out[55]:
In [2]:
%%html
<style>table,td,tr,th {border:1px solid black!important}
</style>
In [56]:
from IPython.display import HTML
from IPython.display import IFrame
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')
Out[56]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.
In [ ]: