There’s a lot of social media analysis going on these days. One of the prime sources of data is Twitter – and it’s available to everyone. Well, mostly.
Twitter data flows continuously in a stream. Around the world people Tweet about 6,000 times a second. That’s about 500 million Tweets per day.
If you want to know what’s happening on Twitter, all you have to do is listen.
That doesn’t mean writing down every Tweet that pops up in your browser, it means… (duh duh duuuuh) … a Python script. So, how do you listen to Twitter using Python?
Getting Your Credentials
The first thing you will need is a Twitter account. It’s pretty simple to set one up and you don’t need to provide your real personal details. Once you’ve got an account you then need to register an application here.
Just like Uber, our credentials will consist of four identifiers:
- Consumer Key (API Key)
- Consumer Secret (API Secret)
- Access Token
- Access Token Secret
The first two should be obvious on the Twitter developers’ page, which looks something like this:
Copy the API Key and API Secret into a text file (I call mine “keys.txt“), then scroll to the bottom of the page and hit the button to create an access token:
On the next page copy the Access Token and Access Token Secret into your text file.
We’re now all set up to listen to Twitter. We could access the Twitter API directly using the Twitter Python library, but there are also a bunch of other libraries out there that have simplified API access for us already. I’m going to use the Tweepy library (you can install it using pip).
import tweepy
Using Tweepy for Authentication
I like to create a little function to read my info from the text file and then launch the Tweepy Twitter Authenticator (that’s probably not its real name but it sounds cool).
def get_api(): """ Creates an instance of the tweepy OAuth class """ with open(’keys.txt’) as f: api_key = f.readline().strip() api_secret = f.readline().strip() access_token = f.readline().strip() access_token_secret = f.readline().strip() auth = tweepy.OAuthHandler(api_key, api_secret) auth.set_access_token(access_token, access_token_secret) return auth
Listening to Twitter
Tweepy has an inbuilt Python Class called StreamListener, which is designed to listen to the Twitter stream. I want to add some bells and whistles and make my own Listener Class, so I’ll create a child Class called MyListener:
class MyListener(StreamListener): def on_status(self, status): """ Print out the text of the tweets """ print(status.text)
This is pretty much the simplest Twitter listener you could create. It basically prints out each incoming Tweet to the command line.
That’s all good, but you probably want to add in a safety guard in case something goes wrong. You can do this by modifying the MyListener class to include a function which kicks in if an error is raised:
class MyListener(tweepy.StreamListener): def on_status(self, status): """ Print out the text of the tweets """ print(status.text) return def on_error(self, status_code): """ Catch errors """ print >> sys.stderr, ’Encountered error with status code:’, status_code if (status_code == 420): print "Too ... much ... data ..." return False else: return True
Each error will be associated with a different status_code, depending on the cause of the error. The only one we really need to watch out for is status_code=420
. This error basically means that Twitter thinks you’re asking for too much data in too short a period of time. That might not sound like a serious error, but it’s the one which will get your application blacklisted by Twitter if you repeat offend.
By including the return False
statement our on_error
function will exit out of the listener, stop the application and Twitter will not cut us off…
Putting It All Together
So, now we’ve got the pieces, how do we use them? Pretty simple:
# Get our credentials set up: myauth = get_api() # Initiate the Twitter API: my_listener = MyListener() my_stream = tweepy.Stream(auth=myauth, listener=my_listener) # Start filtering Twitter: my_stream.filter(track=["python"])
There are 3 steps here:
1. Authenticate yourself;
2. Initiate the Twitter stream by specifying who you are and which listener you’re using;
3. Start listening.
Et voilá. We’ve connected to the Twitter streaming API.
However, we can’t just receive ALL tweets because that would be too much for Twitter to handle (and probably us too), so we have to specify a filter.
The filter I’ve specified above is simply the word “python”, i.e. return only Tweets that contain the word “python”, but there are a few different ways to filter Tweets (described here):
Another really useful way to filter Tweets is using a geographical filter. These are specified in terms of longitude and latitude:
# Filter by location: CPT_GEOBOX = [18.299103,-34.359806,18.550415,-33.865854] my_stream.filter(locations=CPT_GEOBOX)
This may seem a bit difficult if you don’t know the longitude and latitude of different points on the map off the top of your head, but fortunately the internet can help:
http://boundingbox.klokantech.com/
And… that’s it. Once your listener is receiving tweets you can do with them what you please.