If you’re like me you’ve probably been using Google Books without really thinking about it. I never really considered that there might be a philosophy or purpose to it. I just assumed that people put books online because… well, they can.
It turns out that there’s quite a lot more to it.
However, if you’ve played around with the Google Calendar API you might be a bit shocked by the lack of a coherent introduction to the Google Books API. There’s not a great description out there for Python. (Or maybe there is and my google-ing skills are not as well-honed as I thought.)
pip install --upgrade google-api-python-client
Well, using this is about as clear as mud. To get authorized you need to go to this webpage.
Follow the link to the Cloud Platform Console. If you scroll down you’ll see a box called “Use Google APIs”, which invites you to “enable and manage APIs”. If you click this you can create a new project. The default is called “try-apis”.
Once you’ve created a project you can then enable one (or many) or the Google APIs for it. Click “API Manager” –> “Library” –> whichever API you’re after: Google Books for this example.
You then hit “enable” at the top of the page and fill out the credentials. You’ll then be given an API key.
Libraries
To get started these are the libraries I’m going to import:
import sys import json from apiclient.discovery import build
I like to keep my access tokens in a text file called ‘keys.txt‘, which I then access in Python using a function like this:
def get_api(): ''' Creates an instance of the Google Books API ''' with open('keys.txt') as f: api_key = f.readline().strip() return api_key
which I call like this:
apikey = get_api()
Accessing Google
We can then access the Google books API like this:
service = build('books', 'v1', developerKey=apikey)
The Google Books API description can be found here. Although there’s a more accessible description of the query parameters here.
Basically it lets you search through their catalogue, e.g.
request = service.volumes().list(maxResults=40, filter='free-ebooks', q='Caravaggio')
The filter
selects whether books are publically available or if they require you to pay for them. I’m just looking for free ebooks.
The q
parameter is the query. It’s a text string query and searches through the full text. If you just want to query the title you can write
q = 'intitle:caravaggio'
Note the difference in Python syntax here from what’s described for other languages. Multiple search terms like q = 'caravaggio bowie'
will search for caravaggio AND bowie, whereas q = 'caravaggio -bowie'
will search for caravaggio and NO bowie. If you want an exact phrase rather than one word AND another: q='"earth worms"'
.
Once we’ve formed the query we need to execute it:
response = request.execute()
Getting a Response
The response itself is going to be in JSON format, so we need to use the json
library to read it. It’s basically a list of volumes, each of which is an item
.
We could dump out all of their info:
print json.dumps(response, sort_keys=True, indent=4)
But first you might want to check how many books you’ve found…
print "Number of books in list:", len(response['items'])
We can make a more readable list of the books in the list using:
for book in response.get('items', []): print 'Title: %s, ID: %s' % ( book['volumeInfo']['title'], book['id'])
With the Volume ID specified, we can find a particular book:
response = service.volumes().get(volumeId='58JCAQAAMAAJ').execute()
This should give you the JSON object for just that file, e.g.:
{ "accessInfo": { "accessViewStatus": "FULL_PUBLIC_DOMAIN", "country": "ZA", "embeddable": true, "epub": { "isAvailable": false }, "pdf": { "downloadLink": "http://books.google.co.za/books/download/Insects_crustaceans_and_worms.pdf?id=58JCAQAAMAAJ&hl=&output=pdf&sig=ACfU3U25To4Z7pE29F_qOVusnPpxL2sIxw&source=gbs_api", "isAvailable": true }, "publicDomain": true, "quoteSharingAllowed": false, "textToSpeechPermission": "ALLOWED", "viewability": "ALL_PAGES", "webReaderLink": "http://play.google.com/books/reader?id=58JCAQAAMAAJ&hl=&printsec=frontcover&source=gbs_api" }, "etag": "5NUfoM3SNSQ", "id": "58JCAQAAMAAJ", "kind": "books#volume", "saleInfo": { "buyLink": "https://play.google.com/store/books/details?id=58JCAQAAMAAJ&rdid=book-58JCAQAAMAAJ&rdot=1&source=gbs_api", "country": "ZA", "isEbook": true, "saleability": "FREE" }, "selfLink": "https://www.googleapis.com/books/v1/volumes/58JCAQAAMAAJ", "volumeInfo": { "allowAnonLogging": false, "authors": [ "Abby Amy Tenney" ], "canonicalVolumeLink": "https://market.android.com/details?id=book-58JCAQAAMAAJ", "contentVersion": "0.1.0.0.full.1", "dimensions": { "height": "18.00 cm" }, "imageLinks": { "extraLarge": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=6&edge=curl&imgtk=AFLRE71ywg5v60Usk4G84rgBeTXAqnDnd4tsvWYxlBfksfS7Kzs92rWFQzizTe5OJpiJkzsYVp9gyKVrzDw2Ol19dI4SXAisWQetFK3l1Jk5De6wfr6ZxBc5bDuRstPk1OGO6DG6AJVc&source=gbs_api", "large": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=4&edge=curl&imgtk=AFLRE73oZQd-ey0z2HKuqB9XZgE9ATnO_rQqeQPzdvRjQ0kMVlfYTjZe87Qt9FL6UBEGKxMuK1aKFjC2UTWVa_FQbmP9Ks-UcTfNulou_plx5sQ2NiGD82a9UM-Jsj7DaW1twqCb90mj&source=gbs_api", "medium": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=3&edge=curl&imgtk=AFLRE700hPR8SLt6h5q2yQyb-mw2Wvo-11EsbTo29dAHw-kdWAI6n3zW1sjbiuCsTYtPVXPkufDA1R2lMjQ8zLgxQg5rWla3Zy0z_PfAemB2LpSNn7lzS49NAJjiad6PBOT5Jx3TrmWy&source=gbs_api", "small": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=2&edge=curl&imgtk=AFLRE71auteyJtOi0_Dgn-ziInD_8JYCUy5F3yHhGS2L1jJ5v0H8sCN1qs5FgUYFtVbKr238ApZyfW36dIhbvthS9HQQz_zsr1cNfdGABHoZnS-nveP386K_xvgMxP-s1gz80Vf2gTvy&source=gbs_api", "smallThumbnail": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=5&edge=curl&imgtk=AFLRE71MOPjItL4fVEL7r0tmEdjLHYVeLVIb6l7bI0NQ2pXef4ykb6iJYkK2bkRJZxGbYWx-sLuMD-boMW1teYilWVvj1_ZO02jA_ltbJQIL1N2PIkVorg7Ro_oJf-ZHhOe66pHaiRNm&source=gbs_api", "thumbnail": "http://books.google.com/books/content?id=58JCAQAAMAAJ&printsec=frontcover&img=1&zoom=1&edge=curl&imgtk=AFLRE70oYJs_JhQrOpk5ef4P7C6JF1gcEK1XStqV0f6OvghYxlDjIQOL4eQPcjDxjlWxzSTwjBPwRvxtNLQmxIz0hzWBsEmqTv6Ct00HsikU1jVEds4m3mem3l3sPk6vP5-Ambt8Fmsb&source=gbs_api" }, "infoLink": "https://play.google.com/store/books/details?id=58JCAQAAMAAJ&source=gbs_api", "language": "en", "maturityRating": "NOT_MATURE", "pageCount": 160, "previewLink": "http://books.google.co.za/books?id=58JCAQAAMAAJ&hl=&source=gbs_api", "printType": "BOOK", "printedPageCount": 160, "publishedDate": "1868", "publisher": "Sheldon and Company", "readingModes": { "image": true, "text": false }, "title": "Insects, crustaceans, and worms" } }
The selfLink
is pretty nice – it gives you an online view of the JSON object.
The webReaderLink
lets you view the online PDF.
I haven’t worked out how to read the full text interactively online, but in the meantime there is an option to download e-publications (“e-pubs”) where they’re available and search their full text with Python using the epub library:
pip install epub
But that’s a story for another day.