Hinton MNIST Dropout in PyTorch

In their classic 2012 paper: Improving neural networks by preventing co-adaptation of feature detectors, Hinton, Srivastava, Krizhevsky, Sutskever and Salakhutdinov showed that using dropout in a feed-forward neural network improved performance significantly. In particular they produced this result for the MNIST dataset: However, if you try to create this result in PyTorch using modern standards …

Installing Textract

A while ago I wrote about how to extract text from PDF documents in Python using the PDFMiner library. However, in a recent project I had some trouble using PDFMiner to extract text, possibly because the documents I was working with were scanned PDFs. In this case the answer is to use OCR-based text extraction, …

Classifying Dead Stars

Pulsar classification is a great example of where machine learning can be used beneficially in astrophysics. It's not the most straightforward classification problem, but here I'm going to outline the basics using the scikit-learn random forest classifier. This post was inspired by Rob Lyon's pulsar classification tutorials in the IAU OAD Data Science Toolkit. This post …

Mind the Gender PayGap

I have been wondering how the gender pay gap in the university sector compares with the average across other sectors, so I thought I'd look at the data. Last update: 23 March 2018. The data I'm using come from the current UK returns on gender pay differences. Every company in the UK with more then 250 …