Skip to content

ALL YOUR BASE ARE BELONG TO US

Physics. Data Science. General Geekery. Probably Coffee.

Tag: webcrawling

Posted on January 22, 2018January 22, 2018

On The Buses II: Fuzzy String Matching

This is the second part of a series of posts about my pet data science project exploring the availability of transport across different areas of Manchester. For those playing catch-up, you might want to take a look at the first post in this series before continuing. In the first post I looked at how to …

Continue reading "On The Buses II: Fuzzy String Matching"

Posted on January 16, 2018

Mining Twitter with Selenium

This is great for freaking people out. It looks like a ghost is typing in your web browser. Web crawling using html parsers to grab links and navigate to new pages with the requests library is all very well, but when you want to physically submit search terms, or login details, or click buttons (etc.) …

Continue reading "Mining Twitter with Selenium"

Posted on November 8, 2017

WebCrawling: YouTube Pagination in Python

A while ago I wrote a blog post about how to scrape videos from YouTube. One question I've been asked since is how to navigate between different pages of search results. So here's how. YouTube The pre-amble looks exactly the same:   Pagination Then we need to find the piece of html that corresponds to …

Continue reading "WebCrawling: YouTube Pagination in Python"

  • Twitter
  • Facebook
  • Google+
  • GitHub
  • WordPress.com
Blog at WordPress.com.
  • Follow Following
    • ALL YOUR BASE ARE BELONG TO US
    • Join 30 other followers
    • Already have a WordPress.com account? Log in now.
    • ALL YOUR BASE ARE BELONG TO US
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...