The UK parliament has a bunch of different committees who are tasked with conducting inquiries and producing reports about a range of different topics. The individual inquiries normally have a fixed duration and they hear evidence from a range of individuals known as witnesses. The UK parliament website maintains a list of open inquiries that …
On The Buses II: Fuzzy String Matching
This is the second part of a series of posts about my pet data science project exploring the availability of transport across different areas of Manchester. For those playing catch-up, you might want to take a look at the first post in this series before continuing. In the first post I looked at how to …
On the Buses
I've started a little project to look at availability of public transport across Manchester. To keep things easy to follow I'm going to split up the different elements of the project between blog posts. First off, I want to know all the bus routes in Greater Manchester and I want to know which ones go …
Mining Twitter with Selenium
This is great for freaking people out. It looks like a ghost is typing in your web browser. Web crawling using html parsers to grab links and navigate to new pages with the requests library is all very well, but when you want to physically submit search terms, or login details, or click buttons (etc.) …
WebCrawling: YouTube Pagination in Python
A while ago I wrote a blog post about how to scrape videos from YouTube. One question I've been asked since is how to navigate between different pages of search results. So here's how. YouTube The pre-amble looks exactly the same: Pagination Then we need to find the piece of html that corresponds to …
Continue reading "WebCrawling: YouTube Pagination in Python"
Web Scraping YouTube Videos in Python
Web crawling and web scraping are two sides of the same coin. Web scraping is simply extracting information from the internet in an automated fashion. Web crawling is about indexing information on webpages and - normally - using it to access other webpages where the thing you actually want to scrape is located. YouTube is …