Hinton MNIST Dropout in PyTorch

In their classic 2012 paper: Improving neural networks by preventing co-adaptation of feature detectors, Hinton, Srivastava, Krizhevsky, Sutskever and Salakhutdinov showed that using dropout in a feed-forward neural network improved performance significantly. In particular they produced this result for the MNIST dataset: However, if you try to create this result in PyTorch using modern standards …

Installing Textract

A while ago I wrote about how to extract text from PDF documents in Python using the PDFMiner library. However, in a recent project I had some trouble using PDFMiner to extract text, possibly because the documents I was working with were scanned PDFs. In this case the answer is to use OCR-based text extraction, …

Classifying Dead Stars

Pulsar classification is a great example of where machine learning can be used beneficially in astrophysics. It's not the most straightforward classification problem, but here I'm going to outline the basics using the scikit-learn random forest classifier. This post was inspired by Rob Lyon's pulsar classification tutorials in the IAU OAD Data Science Toolkit. This post …

Mind the Gender PayGap

I have been wondering how the gender pay gap in the university sector compares with the average across other sectors, so I thought I'd look at the data. Last update: 23 March 2018. The data I'm using come from the current UK returns on gender pay differences. Every company in the UK with more then 250 …

On the Buses

I've started a little project to look at availability of public transport across Manchester. To keep things easy to follow I'm going to split up the different elements of the project between blog posts. First off, I want to know all the bus routes in Greater Manchester and I want to know which ones go …