Google Books from Jupyter

Open-Bookpic

If you’re like me you’ve probably been using Google Books without really thinking about it. I never really considered that there might be a philosophy or purpose to it. I just assumed that people put books online because… well, they can.

It turns out that there’s quite a lot more to it.

However, if you’ve played around with the Google Calendar API you might be a bit shocked by the lack of a coherent introduction to the Google Books API. There’s not a great description out there for Python. (Or maybe there is and my google-ing skills are not as well-honed as I thought.)

pip install --upgrade google-api-python-client

Well, using this is about as clear as mud. To get authorized you need to go to this webpage.

Follow the link to the Cloud Platform Console. If you scroll down you’ll see a box called “Use Google APIs”, which invites you to “enable and manage APIs”. If you click this you can create a new project. The default is called “try-apis”.

Once you’ve created a project you can then enable one (or many) or the Google APIs for it. Click “API Manager” –> “Library” –> whichever API you’re after: Google Books for this example.

You then hit “enable” at the top of the page and fill out the credentials. You’ll then be given an API key.

Libraries

To get started these are the libraries I’m going to import:

import sys
import json
from apiclient.discovery import build

I like to keep my access tokens in a text file called ‘keys.txt‘, which I then access in Python using a function like this:

def get_api():

    '''
    Creates an instance of the Google Books API
    '''

    with open('keys.txt') as f:
        api_key = f.readline().strip()

    return api_key

which I call like this:

apikey = get_api()

Accessing Google

We can then access the Google books API like this:

service = build('books', 'v1', developerKey=apikey)

The Google Books API description can be found here. Although there’s a more accessible description of the query parameters here.

Basically it lets you search through their catalogue, e.g.

request = service.volumes().list(maxResults=40,
                                 filter='free-ebooks',
                                 q='Caravaggio')

The filter selects whether books are publically available or if they require you to pay for them. I’m just looking for free ebooks.

The q parameter is the query. It’s a text string query and searches through the full text. If you just want to query the title you can write

q = 'intitle:caravaggio'

Note the difference in Python syntax here from what’s described for other languages. Multiple search terms like q = 'caravaggio bowie' will search for caravaggio AND bowie, whereas q = 'caravaggio -bowie' will search for caravaggio and NO bowie. If you want an exact phrase rather than one word AND another: q='"earth worms"'.

Once we’ve formed the query we need to execute it:

response = request.execute()

Getting a Response

The response itself is going to be in JSON format, so we need to use the json library to read it. It’s basically a list of volumes, each of which is an item.

We could dump out all of their info:

print json.dumps(response, sort_keys=True, indent=4)

But first you might want to check how many books you’ve found…

print "Number of books in list:", len(response['items'])

We can make a more readable list of the books in the list using:

for book in response.get('items', []):
    print 'Title: %s, ID: %s' % (
        book['volumeInfo']['title'],
        book['id'])

With the Volume ID specified, we can find a particular book:

response = service.volumes().get(volumeId='58JCAQAAMAAJ').execute()

This should give you the JSON object for just that file, e.g.:

{
    "accessInfo": {
        "accessViewStatus": "FULL_PUBLIC_DOMAIN",
        "country": "ZA",
        "embeddable": true,
        "epub": {
            "isAvailable": false
        },
        "pdf": {
            "downloadLink": "http://books.google.co.za/books/download/Insects_crustaceans_and_worms.pdf?id=58JCAQAAMAAJ&hl=&output=pdf&sig=ACfU3U25To4Z7pE29F_qOVusnPpxL2sIxw&source=gbs_api",
            "isAvailable": true
        },
        "publicDomain": true,
        "quoteSharingAllowed": false,
        "textToSpeechPermission": "ALLOWED",
        "viewability": "ALL_PAGES",
        "webReaderLink": "http://play.google.com/books/reader?id=58JCAQAAMAAJ&hl=&printsec=frontcover&source=gbs_api"
    },
    "etag": "5NUfoM3SNSQ",
    "id": "58JCAQAAMAAJ",
    "kind": "books#volume",
    "saleInfo": {
        "buyLink": "https://play.google.com/store/books/details?id=58JCAQAAMAAJ&rdid=book-58JCAQAAMAAJ&rdot=1&source=gbs_api",
        "country": "ZA",
        "isEbook": true,
        "saleability": "FREE"
    },
    "selfLink": "https://www.googleapis.com/books/v1/volumes/58JCAQAAMAAJ",
    "volumeInfo": {
        "allowAnonLogging": false,
        "authors": [
            "Abby Amy Tenney"
        ],
        "canonicalVolumeLink": "https://market.android.com/details?id=book-58JCAQAAMAAJ",
        "contentVersion": "0.1.0.0.full.1",
        "dimensions": {
            "height": "18.00 cm"
        },
        "imageLinks": {
            "extraLarge": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=6&edge=curl&imgtk=AFLRE71ywg5v60Usk4G84rgBeTXAqnDnd4tsvWYxlBfksfS7Kzs92rWFQzizTe5OJpiJkzsYVp9gyKVrzDw2Ol19dI4SXAisWQetFK3l1Jk5De6wfr6ZxBc5bDuRstPk1OGO6DG6AJVc&source=gbs_api",
            "large": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=4&edge=curl&imgtk=AFLRE73oZQd-ey0z2HKuqB9XZgE9ATnO_rQqeQPzdvRjQ0kMVlfYTjZe87Qt9FL6UBEGKxMuK1aKFjC2UTWVa_FQbmP9Ks-UcTfNulou_plx5sQ2NiGD82a9UM-Jsj7DaW1twqCb90mj&source=gbs_api",
            "medium": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=3&edge=curl&imgtk=AFLRE700hPR8SLt6h5q2yQyb-mw2Wvo-11EsbTo29dAHw-kdWAI6n3zW1sjbiuCsTYtPVXPkufDA1R2lMjQ8zLgxQg5rWla3Zy0z_PfAemB2LpSNn7lzS49NAJjiad6PBOT5Jx3TrmWy&source=gbs_api",
            "small": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=2&edge=curl&imgtk=AFLRE71auteyJtOi0_Dgn-ziInD_8JYCUy5F3yHhGS2L1jJ5v0H8sCN1qs5FgUYFtVbKr238ApZyfW36dIhbvthS9HQQz_zsr1cNfdGABHoZnS-nveP386K_xvgMxP-s1gz80Vf2gTvy&source=gbs_api",
            "smallThumbnail": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=5&edge=curl&imgtk=AFLRE71MOPjItL4fVEL7r0tmEdjLHYVeLVIb6l7bI0NQ2pXef4ykb6iJYkK2bkRJZxGbYWx-sLuMD-boMW1teYilWVvj1_ZO02jA_ltbJQIL1N2PIkVorg7Ro_oJf-ZHhOe66pHaiRNm&source=gbs_api",
            "thumbnail": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=1&edge=curl&imgtk=AFLRE70oYJs_JhQrOpk5ef4P7C6JF1gcEK1XStqV0f6OvghYxlDjIQOL4eQPcjDxjlWxzSTwjBPwRvxtNLQmxIz0hzWBsEmqTv6Ct00HsikU1jVEds4m3mem3l3sPk6vP5-Ambt8Fmsb&source=gbs_api"
        },
        "infoLink": "https://play.google.com/store/books/details?id=58JCAQAAMAAJ&source=gbs_api",
        "language": "en",
        "maturityRating": "NOT_MATURE",
        "pageCount": 160,
        "previewLink": "http://books.google.co.za/books?id=58JCAQAAMAAJ&hl=&source=gbs_api",
        "printType": "BOOK",
        "printedPageCount": 160,
        "publishedDate": "1868",
        "publisher": "Sheldon and Company",
        "readingModes": {
            "image": true,
            "text": false
        },
        "title": "Insects, crustaceans, and worms"
    }
}

The selfLink is pretty nice – it gives you an online view of the JSON object.

The webReaderLink lets you view the online PDF.

I haven’t worked out how to read the full text interactively online, but in the meantime there is an option to download e-publications (“e-pubs”) where they’re available and search their full text with Python using the epub library:

pip install epub

But that’s a story for another day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s