There is no excerpt because this is a protected post.
Installing Textract
A while ago I wrote about how to extract text from PDF documents in Python using the PDFMiner library. However, in a recent project I had some trouble using PDFMiner to extract text, possibly because the documents I was working with were scanned PDFs. In this case the answer is to use OCR-based text extraction, …
HTRU1 – Creating the PyTorch Dataset
In my previous post I talked about how to use random forest classification to separate true pulsar candidates from RFI. That classification used numerical features extracted from the processed data. Ultimately it would be interesting to just be able to use the data itself, rather than extracted features. To help me do that I've created …
Classifying Dead Stars
Pulsar classification is a great example of where machine learning can be used beneficially in astrophysics. It's not the most straightforward classification problem, but here I'm going to outline the basics using the scikit-learn random forest classifier. This post was inspired by Rob Lyon's pulsar classification tutorials in the IAU OAD Data Science Toolkit. This post …
UK Regions MapBox Choropleth
Recently I've been playing around with Mapbox. I previously used it to make the visualisation in my post on the UK gender pay gap and while I was doing that I started wondering about using choropleths. The layers you need to make choropleths of the USA are available within Mapbox, but I couldn't find any …
Mind the Gender PayGap
I have been wondering how the gender pay gap in the university sector compares with the average across other sectors, so I thought I'd look at the data. Last update: 23 March 2018. The data I'm using come from the current UK returns on gender pay differences. Every company in the UK with more then 250 …
Night at the Museum: Translation in Python
I love Taipei. I also love Open Data. So I was very happy to read that the National Palace Museum in Taipei had an open data project. According to the article, the museum has put images and meta-data for 70,000 items online. So what do you get if you download the information on a particular …
Continue reading "Night at the Museum: Translation in Python"
Document Scraping with Python
Tired of reading all those documents everyone keeps sending you? Why not get your Jupyter Notebook to do it for you and condense the information? I'm joking of course... but if say you did want to read pdf documents directly in Python, how would you do it? Recently I had a go at doing just …
In the PYNQ: Set-up
Recently I got hold of a PYNQ-Z1 board and accessories kit from Digilent via the Xilinx University Program. This nifty piece of kit supposedly lets you program an FPGA (the Xilinx ZYNQ) using a Jupyter notebook. What's in the Box? Here's a picture of what's inside: Getting Started I'm following the PYNQ Guide to Getting …
Google Books from Jupyter
If you're like me you've probably been using Google Books without really thinking about it. I never really considered that there might be a philosophy or purpose to it. I just assumed that people put books online because... well, they can. It turns out that there's quite a lot more to it. However, if you've …