UK Regions MapBox Choropleth

Recently I’ve been playing around with Mapbox. I previously used it to make the visualisation in my post on the UK gender pay gap and while I was doing that I started wondering about using choropleths.

The layers you need to make choropleths of the USA are available within Mapbox, but I couldn’t find any for the UK, so I started to look at how to go about creating a custom choropleth map. It’s not as easy as you might expect… or at least I didn’t find it to be.

Mapbox works directly with geoJSON files in principle, but in practice it’s so slow that I can’t tell whether it’s actually working or not. To make it faster you need to read the geoJSON in Python and pass it to Mapbox as a dictionary, which is not so problematic. The thing that had me stumped for ages is that Mapbox seems to have a problem with the MultiPolygon type of feature in geoJSON – it only works properly with the Polygon type.

In the example I was following, the choropleths are for different counties of Texas. There are 254 counties and I could get the example code to run if I only plotted the first 115 of them. For the NUTS Level 1 regions of the UK I couldn’t get Mapbox to run at all.

The problem? MultiPolygon.

At first I couldn’t work out why regions of the UK were defined as MultiPolygon in the geoJSON and not just as Polygon. That is until I noticed that the West Midlands was defined as Polygon. Why? Because it has no coast line.

The Multipolygon type is used when the area has a disjointed coverage … like when it includes islands. Interesting fact: the West Midlands is the only NUTS Level 1 region of the UK that has no islands.

Island Life

So here’s my solution. There are probably better solutions, but this is what I went with.

Let’s take the North-East of England. The North East is defined as a MultiPolygon made up of 27 Polygons.

The geoJSON looks like this:

"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" }},"features": [
{ "type": "Feature", "properties": { "NUTS112CD": "UKC", "NUTS112NM": "North East (England)" },
 "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -1.223182516435012, 54.625765249404282 ], [ -1.22502460042248, 54.625656621805653 ], [ -1.224678154166887, 54.62611902935307 ],  ... [-2.034927441128955, 55.810787212358193 ], [ -2.034820638794874, 55.810862614136525 ], [ -2.034426188127988, 55.811058238635127 ], [ -2.031935777454575, 55.809676287068342 ], [ -2.02961246822781, 55.807277007688704 ], [ -2.031640300838046, 55.805457042762193 ] ] ] ] } }
]
}

and it represents a region like this (see the islands?):

inset_ne_map.png

Mapbox wants every region described as a different feature, so I wrote a routine that turns a single MultiPolygon into a list of Polygons that can all be separate features:

import copy

def multipoly_to_polylist(mpoly):

    # find out how many components are in the Multipoly:
    npoly = len(mpoly['features'][0]['geometry']['coordinates'])

    # create a deep copy of the Multipoly dict:
    temp = copy.deepcopy(mpoly)

    # extract some bits and pieces out of the deep copy:
    plist = dict(type="FeatureCollection")
    plist["crs"] = temp["crs"]
    plist["features"] = []

    # loop through the components and make a list of Polygons:
    for n in range(0,npoly):
        geo_feature = dict(type= "Feature")
        geo_feature['properties'] = temp['features'][0]['properties']
        geo_feature['properties']['part'] = str(n)
        geo_feature['geometry'] = dict(type='Polygon')

        # check whether there's more than one polygon - make sure the shape of the coordinate array
        # is correct:
        if (npoly>1):
            geo_feature['geometry']['coordinates'] = temp['features'][0]['geometry']['coordinates'][n]
        else:
            geo_feature['geometry']['coordinates'] = [temp['features'][0]['geometry']['coordinates'][n]]

        # keep appending each Polygon into the list:
        plist['features'].append(copy.deepcopy(geo_feature))

    return plist

I’m using parts of dictionaries as input to new dictionaries here, so it’s really important to make sure that we make a deep copy to work from because otherwise we’ll end up over-writing information. Note that I’ve used it twice in the above routine.

UK Choropleth

Overall, my code looks like this.

First, there’s the libraries. These are the standard ones:

import numpy as np
import pandas as pd
import json
import copy

…and these are the ones for plotting with Plotly:

import plotly.plotly as py
from plotly.graph_objs import *

Then we need to set up our Mapbox token. I’m using the default public token for my (free) Mapbox account, which I have associated to my (free) Plotly account.

To add a Mapbox access token to your Plotly account just go to “Settings” in the dropdown menu that appears when you click your user name in the top right corner of the webpage then select “Mapbox Access Tokens” from the left hand menu.
mapbox_access_token = 'mapbox_access_token'

I got the geoJSON above from the data.gov.uk website, which provides the NUTS Level 1 (and other boundary data) in a range of formats.

webdata

To read it:

with open('ons_nuts1.json') as f:
    data = json.load(f)

Since I’m going to use all of the regions in the geoJSON, I’ve adapted the function from above in order to make a combined list of features to feed to Mapbox as layers:

def multipoly_to_polylist2(mpoly):

    # find out how many components are in the Multipoly:
    npoly = len(mpoly['geometry']['coordinates'])

    # create a deep copy of the Multipoly dict:
    temp = copy.deepcopy(mpoly)

    # extract some bits and pieces out of the deep copy:
    plist = []

    # loop through the components and make a list of Polygons:
    for n in range(0,npoly):
        geo_feature = dict(type= "Feature")
        geo_feature['properties'] = temp['properties']
        geo_feature['properties']['part'] = str(n)
        geo_feature['geometry'] = dict(type='Polygon')
        geo_feature['geometry']['coordinates'] = temp['geometry']['coordinates'][n]

        # check whether there's more than one polygon - make sure the shape of the coordinate array
        # is correct:
        if (npoly>1):
            geo_feature['geometry']['coordinates'] = temp['geometry']['coordinates'][n]
        else:
            geo_feature['geometry']['coordinates'] = [temp['geometry']['coordinates'][n]]

        # keep appending each Polygon into the list:
        plist.append(copy.deepcopy(geo_feature))

    return plist

Here’s how I call that function:

flist = data['features']

sources=[]
for feature in flist:
    plist = multipoly_to_polylist2(feature)
    sources.append({"type": "FeatureCollection", 'features': copy.deepcopy(plist) })

As well as the layers I’m also going to make a list of central longitudes and latitudes for each feature, again following the method in this tutorial.

lons=[]
lats=[]
for k in range(len(sources)):

    # combine (lon,lat) data for all polygons in region:
    npoly = len(sources[k]['features'])
    region_coords = np.array(sources[k]['features'][0]['geometry']['coordinates'][0])
    for j in range(1,npoly):
        tmp = np.array(sources[k]['features'][j]['geometry']['coordinates'][0])
        region_coords = np.vstack((region_coords,tmp))

    if len(region_coords.shape)>1:
        m, M =region_coords[:,0].min(), region_coords[:,0].max()
        lons.append(0.5*(m+M))
        m, M =region_coords[:,1].min(), region_coords[:,1].max()
        lats.append(0.5*(m+M))
    else:
        m, M =region_coords[0].min(), region_coords[0].max()
        lons.append(0.5*(m+M))
        m, M =region_coords[1].min(), region_coords[1].max()
        lats.append(0.5*(m+M))

I’m going to colour each region according to the level of its gender pay gap. The input data file for this code can be found here and the code to produce it can be found here.

# read colour for each region from CSV file
df = pd.read_csv('paygap_region_data_with_colours.csv')
df.head()

colours_header
I’m extracting the colours for the university sector pay gap:

# extract data from pandas dataframe as numpy arrays:
unis=df['Uni'].values
regions=df['Region'].values
colours=df['Colours'].values

# just use university sector data:
regions = regions[np.where(unis==1)]
colours = colours[np.where(unis==1)]

Once we’ve got a list of colours for each region we can then link those to the list of features for each region (i.e. so all the polygons for a particular region have the same colour):

facecolour=[]
for p in range(len(sources)):
    feature_region = copy.deepcopy(sources[p]['features'][0]['properties']['NUTS112NM'])
    feature_region = feature_region.replace(' (England)','') # remove part of the label for easier matching
    feature_region = feature_region.replace(' (England','') # remove part of the label for easier matching
    colour = colours[np.where(regions==feature_region)][0]
    facecolour.append(colour)

Now we’ve got all of these data we can start feeding it to Mapbox and Plotly. First the data for the central longitudes and latitudes. With a size of 1 these are not actually visible:

NUTS1 = dict(type='scattermapbox',
             lat=lats,
             lon=lons,
             mode='markers',
             text="",
             marker=dict(size=1, color='r'),
             showlegend=False,
             hoverinfo='text'
            )

Finally let’s make the layers for the choropleth:

layers=[dict(sourcetype = 'geojson',
             source = sources[k],
             below="water",
             type = 'fill',
             color = facecolour[k],
             opacity=0.7
            ) for k in range(len(sources))]

We can then feed the layers into the layout for Plotly:

layout = dict(title='UK Regions Choropleth',
              font=dict(family='Balto'),
              autosize=False,
              width=800,
              height=800,
              hovermode='closest',

              mapbox=dict(accesstoken=mapbox_access_token,
                          layers=layers,
                          bearing=0,
                          center=dict(
                          lat=52.4638,
                          lon=-1.00),
                          pitch=0,
                          zoom=5,
                    )
              )

fig = dict(data=[NUTS1], layout=layout)

…and plot it!

py.iplot(fig, validate=False, filename='UK-Regions')

choropleth

Then for the blog this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: