Installing Textract

A while ago I wrote about how to extract text from PDF documents in Python using the PDFMiner library. However, in a recent project I had some trouble using PDFMiner to extract text, possibly because the documents I was working with were scanned PDFs. In this case the answer is to use OCR-based text extraction, …

Classifying Dead Stars

Pulsar classification is a great example of where machine learning can be used beneficially in astrophysics. It's not the most straightforward classification problem, but here I'm going to outline the basics using the scikit-learn random forest classifier. This post was inspired by Rob Lyon's pulsar classification tutorials in the IAU OAD Data Science Toolkit. This post …

Mind the Gender PayGap

I have been wondering how the gender pay gap in the university sector compares with the average across other sectors, so I thought I'd look at the data. Last update: 23 March 2018. The data I'm using come from the current UK returns on gender pay differences. Every company in the UK with more then 250 …

Friends of Fiends

It's not a typo. Well it is, but this time it's deliberate. I mistype friends-of-friends so often that I've decided to just give in and call my algorithm "friends-of-fiends" (FOF) instead. Problem statement:  Student X is analysing data towards a galaxy cluster that doesn't have a known redshift, i.e. a known distance away from us. …