Recently I’ve been playing around with Mapbox. I previously used it to make the visualisation in my post on the UK gender pay gap and while I was doing that I started wondering about using choropleths.
The layers you need to make choropleths of the USA are available within Mapbox, but I couldn’t find any for the UK, so I started to look at how to go about creating a custom choropleth map. It’s not as easy as you might expect… or at least I didn’t find it to be.
Mapbox works directly with geoJSON files in principle, but in practice it’s so slow that I can’t tell whether it’s actually working or not. To make it faster you need to read the geoJSON in Python and pass it to Mapbox as a dictionary, which is not so problematic. The thing that had me stumped for ages is that Mapbox seems to have a problem with the MultiPolygon
type of feature in geoJSON – it only works properly with the Polygon
type.
In the example I was following, the choropleths are for different counties of Texas. There are 254 counties and I could get the example code to run if I only plotted the first 115 of them. For the NUTS Level 1 regions of the UK I couldn’t get Mapbox to run at all.
The problem? MultiPolygon.
At first I couldn’t work out why regions of the UK were defined as MultiPolygon in the geoJSON and not just as Polygon. That is until I noticed that the West Midlands was defined as Polygon. Why? Because it has no coast line.
The Multipolygon
type is used when the area has a disjointed coverage … like when it includes islands. Interesting fact: the West Midlands is the only NUTS Level 1 region of the UK that has no islands.
Island Life
So here’s my solution. There are probably better solutions, but this is what I went with.
Let’s take the North-East of England. The North East is defined as a MultiPolygon
made up of 27 Polygon
s.
The geoJSON looks like this:
"type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" }},"features": [ { "type": "Feature", "properties": { "NUTS112CD": "UKC", "NUTS112NM": "North East (England)" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -1.223182516435012, 54.625765249404282 ], [ -1.22502460042248, 54.625656621805653 ], [ -1.224678154166887, 54.62611902935307 ], ... [-2.034927441128955, 55.810787212358193 ], [ -2.034820638794874, 55.810862614136525 ], [ -2.034426188127988, 55.811058238635127 ], [ -2.031935777454575, 55.809676287068342 ], [ -2.02961246822781, 55.807277007688704 ], [ -2.031640300838046, 55.805457042762193 ] ] ] ] } } ] }
and it represents a region like this (see the islands?):
Mapbox wants every region described as a different feature, so I wrote a routine that turns a single MultiPolygon
into a list of Polygon
s that can all be separate features:
import copy def multipoly_to_polylist(mpoly): # find out how many components are in the Multipoly: npoly = len(mpoly['features'][0]['geometry']['coordinates']) # create a deep copy of the Multipoly dict: temp = copy.deepcopy(mpoly) # extract some bits and pieces out of the deep copy: plist = dict(type="FeatureCollection") plist["crs"] = temp["crs"] plist["features"] = [] # loop through the components and make a list of Polygons: for n in range(0,npoly): geo_feature = dict(type= "Feature") geo_feature['properties'] = temp['features'][0]['properties'] geo_feature['properties']['part'] = str(n) geo_feature['geometry'] = dict(type='Polygon') # check whether there's more than one polygon - make sure the shape of the coordinate array # is correct: if (npoly>1): geo_feature['geometry']['coordinates'] = temp['features'][0]['geometry']['coordinates'][n] else: geo_feature['geometry']['coordinates'] = [temp['features'][0]['geometry']['coordinates'][n]] # keep appending each Polygon into the list: plist['features'].append(copy.deepcopy(geo_feature)) return plist
I’m using parts of dictionaries as input to new dictionaries here, so it’s really important to make sure that we make a deep copy to work from because otherwise we’ll end up over-writing information. Note that I’ve used it twice in the above routine.
UK Choropleth
Overall, my code looks like this.
First, there’s the libraries. These are the standard ones:
import numpy as np import pandas as pd import json import copy
…and these are the ones for plotting with Plotly:
import plotly.plotly as py from plotly.graph_objs import *
Then we need to set up our Mapbox token. I’m using the default public token for my (free) Mapbox account, which I have associated to my (free) Plotly account.
mapbox_access_token = 'mapbox_access_token'
I got the geoJSON above from the data.gov.uk website, which provides the NUTS Level 1 (and other boundary data) in a range of formats.
To read it:
with open('ons_nuts1.json') as f: data = json.load(f)
Since I’m going to use all of the regions in the geoJSON, I’ve adapted the function from above in order to make a combined list of features to feed to Mapbox as layers:
def multipoly_to_polylist2(mpoly): # find out how many components are in the Multipoly: npoly = len(mpoly['geometry']['coordinates']) # create a deep copy of the Multipoly dict: temp = copy.deepcopy(mpoly) # extract some bits and pieces out of the deep copy: plist = [] # loop through the components and make a list of Polygons: for n in range(0,npoly): geo_feature = dict(type= "Feature") geo_feature['properties'] = temp['properties'] geo_feature['properties']['part'] = str(n) geo_feature['geometry'] = dict(type='Polygon') geo_feature['geometry']['coordinates'] = temp['geometry']['coordinates'][n] # check whether there's more than one polygon - make sure the shape of the coordinate array # is correct: if (npoly>1): geo_feature['geometry']['coordinates'] = temp['geometry']['coordinates'][n] else: geo_feature['geometry']['coordinates'] = [temp['geometry']['coordinates'][n]] # keep appending each Polygon into the list: plist.append(copy.deepcopy(geo_feature)) return plist
Here’s how I call that function:
flist = data['features'] sources=[] for feature in flist: plist = multipoly_to_polylist2(feature) sources.append({"type": "FeatureCollection", 'features': copy.deepcopy(plist) })
As well as the layers I’m also going to make a list of central longitudes and latitudes for each feature, again following the method in this tutorial.
lons=[] lats=[] for k in range(len(sources)): # combine (lon,lat) data for all polygons in region: npoly = len(sources[k]['features']) region_coords = np.array(sources[k]['features'][0]['geometry']['coordinates'][0]) for j in range(1,npoly): tmp = np.array(sources[k]['features'][j]['geometry']['coordinates'][0]) region_coords = np.vstack((region_coords,tmp)) if len(region_coords.shape)>1: m, M =region_coords[:,0].min(), region_coords[:,0].max() lons.append(0.5*(m+M)) m, M =region_coords[:,1].min(), region_coords[:,1].max() lats.append(0.5*(m+M)) else: m, M =region_coords[0].min(), region_coords[0].max() lons.append(0.5*(m+M)) m, M =region_coords[1].min(), region_coords[1].max() lats.append(0.5*(m+M))
I’m going to colour each region according to the level of its gender pay gap. The input data file for this code can be found here and the code to produce it can be found here.
# read colour for each region from CSV file df = pd.read_csv('paygap_region_data_with_colours.csv') df.head()
I’m extracting the colours for the university sector pay gap:
# extract data from pandas dataframe as numpy arrays: unis=df['Uni'].values regions=df['Region'].values colours=df['Colours'].values # just use university sector data: regions = regions[np.where(unis==1)] colours = colours[np.where(unis==1)]
Once we’ve got a list of colours for each region we can then link those to the list of features for each region (i.e. so all the polygons for a particular region have the same colour):
facecolour=[] for p in range(len(sources)): feature_region = copy.deepcopy(sources[p]['features'][0]['properties']['NUTS112NM']) feature_region = feature_region.replace(' (England)','') # remove part of the label for easier matching feature_region = feature_region.replace(' (England','') # remove part of the label for easier matching colour = colours[np.where(regions==feature_region)][0] facecolour.append(colour)
Now we’ve got all of these data we can start feeding it to Mapbox and Plotly. First the data for the central longitudes and latitudes. With a size of 1 these are not actually visible:
NUTS1 = dict(type='scattermapbox', lat=lats, lon=lons, mode='markers', text="", marker=dict(size=1, color='r'), showlegend=False, hoverinfo='text' )
Finally let’s make the layers for the choropleth:
layers=[dict(sourcetype = 'geojson', source = sources[k], below="water", type = 'fill', color = facecolour[k], opacity=0.7 ) for k in range(len(sources))]
We can then feed the layers into the layout for Plotly:
layout = dict(title='UK Regions Choropleth', font=dict(family='Balto'), autosize=False, width=800, height=800, hovermode='closest', mapbox=dict(accesstoken=mapbox_access_token, layers=layers, bearing=0, center=dict( lat=52.4638, lon=-1.00), pitch=0, zoom=5, ) ) fig = dict(data=[NUTS1], layout=layout)
…and plot it!
py.iplot(fig, validate=False, filename='UK-Regions')
Then for the blog this.