Physics. Data Science. General Geekery.
FinTechAI: Predicting Adjusted Close Values using LSTMs
This is a little intro to using an LSTM model for time series data. It's not in any way a thorough introduction to how LSTMs work, which is pretty complex and far too much info for a short blog like this one... I'm going to demonstrate it using some financial (stock market) data, where I'll …
Continue reading "FinTechAI: Predicting Adjusted Close Values using LSTMs"
Hinton MNIST Dropout in PyTorch
In their classic 2012 paper: Improving neural networks by preventing co-adaptation of feature detectors, Hinton, Srivastava, Krizhevsky, Sutskever and Salakhutdinov showed that using dropout in a feed-forward neural network improved performance significantly. In particular they produced this result for the MNIST dataset: However, if you try to create this result in PyTorch using modern standards …
Installing Textract
A while ago I wrote about how to extract text from PDF documents in Python using the PDFMiner library. However, in a recent project I had some trouble using PDFMiner to extract text, possibly because the documents I was working with were scanned PDFs. In this case the answer is to use OCR-based text extraction, …
HTRU1 – Creating the PyTorch Dataset
In my previous post I talked about how to use random forest classification to separate true pulsar candidates from RFI. That classification used numerical features extracted from the processed data. Ultimately it would be interesting to just be able to use the data itself, rather than extracted features. To help me do that I've created …
Classifying Dead Stars
Pulsar classification is a great example of where machine learning can be used beneficially in astrophysics. It's not the most straightforward classification problem, but here I'm going to outline the basics using the scikit-learn random forest classifier. This post was inspired by Rob Lyon's pulsar classification tutorials in the IAU OAD Data Science Toolkit. This post …
UK Regions MapBox Choropleth
Recently I've been playing around with Mapbox. I previously used it to make the visualisation in my post on the UK gender pay gap and while I was doing that I started wondering about using choropleths. The layers you need to make choropleths of the USA are available within Mapbox, but I couldn't find any …
Mind the Gender PayGap
I have been wondering how the gender pay gap in the university sector compares with the average across other sectors, so I thought I'd look at the data. Last update: 23 March 2018. The data I'm using come from the current UK returns on gender pay differences. Every company in the UK with more then 250 …
Scraping Data into Google Spreadsheets
The UK parliament has a bunch of different committees who are tasked with conducting inquiries and producing reports about a range of different topics. The individual inquiries normally have a fixed duration and they hear evidence from a range of individuals known as witnesses. The UK parliament website maintains a list of open inquiries that …
On The Buses II: Fuzzy String Matching
This is the second part of a series of posts about my pet data science project exploring the availability of transport across different areas of Manchester. For those playing catch-up, you might want to take a look at the first post in this series before continuing. In the first post I looked at how to …
On the Buses
I've started a little project to look at availability of public transport across Manchester. To keep things easy to follow I'm going to split up the different elements of the project between blog posts. First off, I want to know all the bus routes in Greater Manchester and I want to know which ones go …