On the Buses

I’ve started a little project to look at availability of public transport across Manchester. To keep things easy to follow I’m going to split up the different elements of the project between blog posts.

First off, I want to know all the bus routes in Greater Manchester and I want to know which ones go where.

There are two elements to where: (1) where are the bus stops for each route and (2) which administrative ward of Manchester are they in. That’s what I’ll work out in this post.

The end product looks like this:

manchester_map
Administrative wards of Manchester colour-coded by number of bus routes

Pulling Out All the Stops

Transport for Greater Manchester (TfGM) show all the bus routes in Greater Manchester on this webpage. It looks like this:

routes

I’m going to pull the information from it using the Requests library and interpret that html information using the BeautifulSoup library. Both of these are pip installable.

First I’ll set the url I want to query:

 

# set base url:
base = "https://www.tfgm.com"
query = "/public-transport/bus/routes"

Then I’m going to extract the html:

 

# get html from webpage:
r = requests.get(base+query)
page = r.text
soup=bs(page,'lxml')

If you take a look at the html (I just right click and “inspect element” in the browser for speed) you can see that each of the rectangles that links to a bus route has the html classresult-button“.

We can use that to extract the list of bus route buttons from the html:

 

bus_routes = soup.findAll('a',attrs={"class":"result-button"})

Each of the buttons has two other attributes: “id” and “href“. The first of these is the bus name and the second is a link to another webpage that includes the details of all the stops for that bus and the timetable.

To extract that info we can write something like this:

 

buses = []
for each in bus_routes:

	bus = {}

	bus['route'] = each['id']
	bus['url'] = each['href']

	# --------------------------------------
	# I'll add other functions here later...
	# --------------------------------------

	buses.append(bus)

which makes a little dictionary containing the name and page link for each bus route, and then appends it into a list of bus routes.

If we want to get a list of bus-stops for each route we need to visit the page specified in the button ‘href‘. All these pages contain an html list of bus stops, each with a name and a (longitude, latitude).

Each list item (‘li‘) has the html class “bus-stop-link“, so we can use BeautifulSoup to find all of those and extract the “data-longitude” and “data-latitude” attributes as well as the name of the stop:

 

def get_stops(url):

	# set base url:
	base = "https://www.tfgm.com"
	query = url

	# get page html:
	r = requests.get(base+query)
	page = r.text
	soup=bs(page,'lxml')

	# find all bus-stops:
	stops = soup.findAll('li',attrs={"class":"bus-stop-link"})

	# extract info for each bus stop:
	names=[];longs=[];lats=[]
	for stop in stops:

		# get the format of the stop name right:
		temp = stop.text.strip('\n').replace("/ ","")
		if temp.find(',')>-1:
			loc,spot = temp.split(',')
			name = spot+', '+loc
		else:
			name = temp

		names.append(name)
		longs.append(stop['data-longitude'])
		lats.append(stop['data-latitude'])

		# ---------------------------------
		# insert postcode function here...
		# ---------------------------------

	return names,longs,lats

With a list of longitudes and latitudes I can quickly plot a heat map showing the density of bus stops across Greater Manchester, just like I did in this previous post.

heatmap
The great Mancunian flying spaghetti monster

On Wards

Once we have all the longitude and latitude data it’s possible to get the postcode of each bus-stop. I was going to use the postcode to identify the administrative ward but it turns out that we can get both of these simultaneously from Postcodes.io !

Postcodes.io allows a standard url GET request and has no rate/call limit. It’s brilliant.

Sometimes though it just returns a response of None. Looking at the co-ordinates that correspond to these responses, they seem to happen when you submit a (longitude, latitude) that sits on the border between two postcodes.

 

def get_postcode(lon,lat):

	# set requests url:
	base = "http://api.postcodes.io/postcodes?lon="
	query = lon+"&lat="+lat

	# get response as a dictionary:
	r = requests.get(base+query)
	page = json.loads(r.text)

	# check error status of response:
	if page['status']==200:
		# 200 means all is well, but it doesn't mean
		# that you'll actually get a postcode...
		if page['result']==None:
			postcode='None'
			ward='None'
		else:
			postcode = page['result'][0]['postcode']
			ward = page['result'][0]['admin_ward']
	else:
		postcode='None'
		ward='None'

	return postcode,ward

We can then write all of the info out to file if we want to:

 

jsonfile = "manchester_bus_info.json"
# write the resulting list of post dictionaries to a JSON file with UTF8 encoding:
with io.open(jsonfile, 'w', encoding='utf-8') as f:
	output = json.dumps(buses, ensure_ascii=False)
	f.write(output)

Running the Numbers

Once we’ve got all of this info we can then build up a picture of the availability of buses in the different wards of Manchester.

Ultimately I want to define availability as the average number of buses per hour in a day, i.e. a bus stop with one bus every 20 minutes would have the same availability as a bus stop with three buses once an hour. For now though I’m just going to work out the absolute number of bus routes that pass through each ward.

First up, I need a list of all the administrative wards in Manchester. I’ve taken this from the Manchester City Council Intelligence Hub. You can download their data as an excel file. I exported the sheet for population data as a CSV file called ward_data.csv.

 

# open the CSV file:
csvfile = open('ward_data.csv', 'rU')

# start a CSV reader:
reader = csv.reader(csvfile)

# get the first row of the CSV file (this contains the headings):
keys = next(csvfile, None).split(',')

# loop over rows in file:
wards=[]
for row in reader:

	# make a dict for each ward:
	ward = {}
	for i in range(0,len(keys)):
		# put the population data in the dict:
		ward[str(keys[i])] = str(row[i]).strip('\n')

	# call a function that matches wards with bus routes:
	nbus,bus_names = get_bus_names(row[0])

	# update the ward dict:
	ward['buses'] = bus_names
	ward['no. buses'] = nbus

	# append this dict into a list:
	wards.append(ward)

jsonfile = "manchester_ward_info.json"
# write the resulting list of post dictionaries to
# a JSON file with UTF8 encoding:
with io.open(jsonfile, 'w', encoding='utf-8') as f:
  	f.write(json.dumps(wards, ensure_ascii=False))

The above piece of code calls a function: get_bus_names(wardname). That function cross-references our list of wards with the bus stop information we already wrote into a JSON file. It looks like this:

 

def get_bus_names(ward):

	# set output data filename:
	inname = "manchester_bus_info.json"

	# read input data from file:
	input_file  = file(inname, "r")
	buses = json.loads(input_file.read().decode("utf-8-sig"))

	# get the bus routes that pass through the ward:
	bus_names = []
	for bus in buses:
		if ward in bus["wards"]:
			bus_names.append(bus['route'])

	# number of buses:
	nbus = len(bus_names)

	return nbus, bus_names

You can see that the function returns two things: (1) a list of the names of bus routes that run through the ward (bus_names), and (2) the number of bus routes that run through the ward (nbus).

Note: If a bus runs in two directions through a ward it will be counted twice. It’s not possible to just divide the total number of routes by two because some bus routes are circular and only appear once. After thinking about it for a while, I’ve decided I’m happy to count the different directions of routes individually. You might disagree though.

Making a Scene

To plot this up, I adapted one of the matplotlib Basemap examples from here. Basemap makes a pretty good plot, but holy crap is it SLOW.

I’ve used the Stereographic projection, which means I have to specify a central (lon,lat), which are lon_0 and lat_0, and I also have to specify the latitude for the true scale, lat_ts.

from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt

man_lon = -2.2363
man_lat = 53.445

# create figure and axes instances
fig = plt.figure(figsize=(12,12))
ax = fig.add_axes([0.1,0.1,0.8,0.8])

m = Basemap(width=15000,height=25000,
            resolution='f',projection='stere',\
            lat_ts=man_lat,lat_0=man_lat,lon_0=man_lon)

# this is a function to extract the number of
# bus routes per ward from the manchester_ward_info.json
# file we made earlier:
wardnames,nbuses = get_ward_nbus()

# we're going to blank out the city centre:
nbuses[np.where(wardnames=='City Centre')] = 10.0

# I want a logarithmic colour scale:
max_bus = np.max(nbuses)
buses_n = nbuses/max_bus
min_bus = np.max(buses_n)
buses_n = np.log10((10./min_bus)*buses_n)

# loop over all wards:
for i in range(len(wardnames)):

	# this is a function to get the (lon,lat)
	# co-ordinates of each ward boundary:
	lons,lats = read_ward_coords(wardnames[i])

	if (len(lons)>0):
		# compute map proj coordinates from
		# (lon,lat) data:
		x, y = m(lons, lats) 

		# fill in colour:
		if wardnames[i]=='City Centre':
			plt.fill(x,y,'black', alpha=1.0)
		else:
			plt.fill(x,y,'b', alpha=buses_n[i])

plt.savefig('manchester_map.png')
plt.show()

There are two functions in this plotting routine: get_ward_nbus() and read_ward_coords(wardname).

The first one is pretty straightforward and if you’ve got this far, you don’t need me to write it out. The second one needs a bit of input.

Manchester City Council publish the longitude and latitude data for the boundaries of their administrative wards here.

The data come as KML files, which you can open with any text editor. I downloaded all of these into a directory called “WARD_BOUNDARIES” and just queried them iteratively. One useful tip is that I had to adapt the format of each ward name to match the format of the KML file names:

filename = "./WARD_BOUNDARIES/"+ward.lower().replace(" & ","").replace(" ","")+".kml"

The full function looks like this:

def read_ward_coords(ward):

	filename = "./WARD_BOUNDARIES/"+ward.lower().replace(" & ","").replace(" ","")+".kml"

	if os.path.exists(filename):

		infile = open(filename,'r')

		lon=[];lat=[]
		while True:
			line = infile.readline()
			if not line: break

			items = line.split(',')
			if len(items)>1:
				lon.append(float(items[0]))
				lat.append(float(items[1]))

		lon = np.array(lon)
		lat = np.array(lat)
	else:
		print "No boundary data for ",ward
		lon=[];lat=[]

	return lon,lat

And that’s it for now. One interesting thing about the current map is that there seem to be a much smaller number of buses passing through Whalley Range than other areas. It’s not clear what that means at the moment, without including other data, but the next steps in my project should tell me whether it’s significant.

One Reply to “On the Buses”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s