Twitter in Python

There’s a lot of social media analysis going on these days. One of the prime sources of data is Twitter – and it’s available to everyone. Well, mostly.

Twitter data flows continuously in a stream. Around the world people Tweet about 6,000 times a second. That’s about 500 million Tweets per day.

If you want to know what’s happening on Twitter, all you have to do is listen.

That doesn’t mean writing down every Tweet that pops up in your browser, it means… (duh duh duuuuh) … a Python script. So, how do you listen to Twitter using Python?

Getting Your Credentials

The first thing you will need is a Twitter account. It’s pretty simple to set one up and you don’t need to provide your real personal details. Once you’ve got an account you then need to register an application here.

Just like Uber, our credentials will consist of four identifiers:

  1. Consumer Key (API Key)
  2. Consumer Secret (API Secret)
  3. Access Token
  4. Access Token Secret

The first two should be obvious on the Twitter developers’ page, which looks something like this:

figure_1

Copy the API Key and API Secret into a text file (I call mine “keys.txt“), then scroll to the bottom of the page and hit the button to create an access token:

 

figure_2

On the next page copy the Access Token and Access Token Secret into your text file.

We’re now all set up to listen to Twitter. We could access the Twitter API directly using the Twitter Python library, but there are also a bunch of other libraries out there that have simplified API access for us already. I’m going to use the Tweepy library (you can install it using pip).


import tweepy

 

Using Tweepy for Authentication

I like to create a little function to read my info from the text file and then launch the Tweepy Twitter Authenticator (that’s probably not its real name but it sounds cool). 


def get_api():

    """
    Creates an instance of the tweepy OAuth class
    """

    with open(’keys.txt’) as f:

        api_key = f.readline().strip()
        api_secret = f.readline().strip()
        access_token = f.readline().strip()
        access_token_secret = f.readline().strip()

        auth = tweepy.OAuthHandler(api_key, api_secret)
        auth.set_access_token(access_token, access_token_secret)

return auth

Listening to Twitter

Tweepy has an inbuilt Python Class called StreamListener, which is designed to listen to the Twitter stream. I want to add some bells and whistles and make my own Listener Class, so I’ll create a child Class called MyListener:


class MyListener(StreamListener):

    def on_status(self, status):

            """
            Print out the text of the tweets
            """

        print(status.text)

This is pretty much the simplest Twitter listener you could create. It basically prints out each incoming Tweet to the command line.

That’s all good, but you probably want to add in a safety guard in case something goes wrong. You can do this by modifying the MyListener class to include a function which kicks in if an error is raised:


class MyListener(tweepy.StreamListener): 

    def on_status(self, status):

        """
        Print out the text of the tweets
        """

        print(status.text)

        return

    def on_error(self, status_code):

        """
        Catch errors
        """

        print >> sys.stderr, ’Encountered error with status code:’, status_code

        if (status_code == 420):

            print "Too ... much ... data ..."
            return False

        else:

            return True

Each error will be associated with a different status_code, depending on the cause of the error. The only one we really need to watch out for is status_code=420. This error basically means that Twitter thinks you’re asking for too much data in too short a period of time. That might not sound like a serious error, but it’s the one which will get your application blacklisted by Twitter if you repeat offend.

By including the return False statement our on_error function will exit out of the listener, stop the application and Twitter will not cut us off…

Putting It All Together

So, now we’ve got the pieces, how do we use them? Pretty simple:


    # Get our credentials set up:
    myauth = get_api()

    # Initiate the Twitter API:
    my_listener = MyListener()
    my_stream = tweepy.Stream(auth=myauth, listener=my_listener)

    # Start filtering Twitter:
    my_stream.filter(track=["python"])

There are 3 steps here:

1. Authenticate yourself;
2. Initiate the Twitter stream by specifying who you are and which listener you’re using;
3. Start listening.

Et voilá. We’ve connected to the Twitter streaming API.

However, we can’t just receive ALL tweets because that would be too much for Twitter to handle (and probably us too), so we have to specify a filter.

The filter I’ve specified above is simply the word “python”, i.e. return only Tweets that contain the word “python”, but there are a few different ways to filter Tweets (described here):

figure_3

Another really useful way to filter Tweets is using a geographical filter. These are specified in terms of longitude and latitude:


# Filter by location:
CPT_GEOBOX = [18.299103,-34.359806,18.550415,-33.865854]
my_stream.filter(locations=CPT_GEOBOX)

This may seem a bit difficult if you don’t know the longitude and latitude of different points on the map off the top of your head, but fortunately the internet can help:

figure_4

http://boundingbox.klokantech.com/

And… that’s it. Once your listener is receiving tweets you can do with them what you please.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s