Thursday, August 6, 2015

Accessing Gmail from Python (plus BONUS)

NOTE: The code covered in this blogpost is also available in a video walkthrough here.

UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run(). You can read more about that change here.

Introduction

The last several posts have illustrated how to connect to public/simple and authorized Google APIs. Today, we're going to demonstrate accessing the Gmail (another authorized) API. Yes, you read that correctly... "API." In the old days, you access mail services with standard Internet protocols such as IMAP/POP and SMTP. However, while they are standards, they haven't kept up with modern day email usage and developers' needs that go along with it. In comes the Gmail API which provides CRUD access to email threads and drafts along with messages, search queries, management of labels (like folders), and domain administration features that are an extra concern for enterprise developers.

Earlier posts demonstrate the structure and "how-to" use Google APIs in general, so the most recent posts, including this one, focus on solutions and apps, and use of specific APIs. Once you review the earlier material, you're ready to start with Gmail scopes then see how to use the API itself.

    Gmail API Scopes

    Below are the Gmail API scopes of authorization. We're listing them in most-to-least restrictive order because that's the order you should consider using them in  use the most restrictive scope you possibly can yet still allowing your app to do its work. This makes your app more secure and may prevent inadvertently going over any quotas, or accessing, destroying, or corrupting data. Also, users are less hesitant to install your app if it asks only for more restricted access to their inboxes.
    • 'https://www.googleapis.com/auth/gmail.readonly' — Read-only access to all resources + metadata
    • 'https://www.googleapis.com/auth/gmail.send' — Send messages only (no inbox read nor modify)
    • 'https://www.googleapis.com/auth/gmail.labels' — Create, read, update, and delete labels only
    • 'https://www.googleapis.com/auth/gmail.insert' — Insert and import messages only
    • 'https://www.googleapis.com/auth/gmail.compose' — Create, read, update, delete, and send email drafts and messages
    • 'https://www.googleapis.com/auth/gmail.modify' — All read/write operations except for immediate & permanent deletion of threads & messages
    • 'https://mail.google.com/' — All read/write operations (use with caution)

    Using the Gmail API

    We're going to create a sample Python script that goes through your Gmail threads and looks for those which have more than 2 messages, for example, if you're seeking particularly chatty threads on mailing lists you're subscribed to. Since we're only peeking at inbox content, the only scope we'll request is 'gmail.readonly', the most restrictive scope. The API string is 'gmail' which is currently on version 1, so here's the call to apiclient.discovery.build() you'll use:

    GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))

    Note that all lines of code above that is predominantly boilerplate (that was explained in earlier posts). Anyway, once you have an established service endpoint with build(), you can use the list() method of the threads service to request the file data. The one required parameter is the user's Gmail address. A special value of 'me' has been set aside for the currently authenticated user.
    threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
    If all goes well, the (JSON) response payload will (not be empty or missing and) contain a sequence of threads that we can loop over. For each thread, we need to fetch more info, so we issue a second API call for that. Specifically, we care about the number of messages in a thread:
    for thread in threads:
        tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
        nmsgs = len(tdata['messages'])
    
    We're seeking only all threads more than 2 (that means at least 3) messages, discarding the rest. If a thread meets that criteria, scan the first message and cycle through the email headers looking for the "Subject" line to display to users, skipping the remaining headers as soon as we find one:
        if nmsgs > 2:
            msg = tdata['messages'][0]['payload']
            subject = ''
            for header in msg['headers']:
                if header['name'] == 'Subject':
                    subject = header['value']
                    break
            if subject:
                print('%s (%d msgs)' % (subject, nmsgs))
    
    If you're on many mailing lists, this may give you more messages than desired, so feel free to up the threshold from 2 to 50, 100, or whatever makes sense for you. (In that case, you should use a variable.) Regardless, that's pretty much the entire script save for the OAuth2 code that we're so familiar with from previous posts. The script is posted below in its entirety, and if you run it, you'll see an interesting collection of threads... YMMV depending on what messages are in your inbox:
    $ python3 gmail_threads.py
    [Tutor] About Python Module to Process Bytes (3 msgs)
    Core Python book review update (30 msgs)
    [Tutor] scratching my head (16 msgs)
    [Tutor] for loop for long numbers (10 msgs)
    [Tutor] How to show the listbox from sqlite and make it searchable? (4 msgs)
    [Tutor] find pickle and retrieve saved data (3 msgs)
    

    BONUS: Python 3!

    As of Mar 2015 (formally in Apr 2015 when the docs were updated), support for Python 3 was added to Google APIs Client Library (3.3+)! This update was a long time coming (relevant GitHub thread), and allows Python 3 developers to write code that accesses Google APIs. If you're already running 3.x, you can use its pip command (pip3) to install the Client Library:

    $ pip3 install -U google-api-python-client

    Because of this, unlike previous blogposts, we're deliberately going to avoid use of the print statement and switch to the print() function instead. If you're still running Python 2, be sure to add the following import so that the code will also run in your 2.x interpreter:

    from __future__ import print_function

    Conclusion

    To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for threads().list(). For more information on what other operations you can execute with the Gmail API, take a look at the reference docs and check out the companion video for this code sample. That's it!

    Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):
    from __future__ import print_function
    
    from apiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = 'https://www.googleapis.com/auth/gmail.readonly'
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
        creds = tools.run_flow(flow, store)
    GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))
    
    threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
    for thread in threads:
        tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
        nmsgs = len(tdata['messages'])
    
        if nmsgs > 2:
            msg = tdata['messages'][0]['payload']
            subject = ''
            for header in msg['headers']:
                if header['name'] == 'Subject':
                    subject = header['value']
                    break
            if subject:
                print('%s (%d msgs)' % (subject, nmsgs))
    
    You can now customize this code for your own needs, for a mobile frontend, a server-side backend, or to access other Google APIs. If you want to see another example of using the Gmail API (displaying all your inbox labels), check out the Python Quickstart example in the official docs or its equivalent in Java (server-side, Android), iOS (Objective-C, Swift), C#/.NET, PHP, Ruby, JavaScript (client-side, Node.js), or Go. That's it... hope you find these code samples useful in helping you get started with the Gmail API!

    EXTRA CREDIT: To test your skills and challenge yourself, try writing code that allows users to perform a search across their email, or perhaps creating an email draft, adding attachments, then sending them! Note that to prevent spam, there are strict Program Policies that you must abide with... any abuse could rate limit your account or get it shut down. Check out those rules plus other Gmail terms of use here.