Parse an Email Inbox With Python

Learn how to use the Nylas Python SDK to parse an email inbox and return JSON representations of email content.

Ben Lloyd Pearson | August 18, 2020

If you need to ingest email data into your Python app, chances are you’ve looked into the various built in and third-party libraries that are out to parse an email inbox with Python, and you might be a little overwhelmed with all of the options. There are also a lot of questions you might be asking yourself right now. Do you need to handle raw MIME? Will you connect to the email account via IMAP, or via third-party client libraries for providers like Gmail, Exchange, or Office365? How will you handle authenticating user accounts with third-party email providers?

Fortunately for you, the Nylas Communications Platform abstracts away the complexity of all these concerns to provide a single point of integration via the Nylas Email API. With Nylas, you can instantly connect 100% of emails accounts to your app and access JSON representations of your users’ entire email inbox via a simple REST API.  This article will show you how to parse an email inbox with the Nylas Python SDK

Want a PDF of this article?

Share it with a friend or save it for later reading.

Setup Your Python Environment

First, you need to install the Nylas Python SDK, which makes it easy to connect to the Nylas Communications Platform. Make sure you have pip installed on your machine, then run pip install nylas from the terminal. Next, create a new Python script file and open it in the text editor of your choice. The script will start by importing all of the libraries we’ll use for this example. We need three things:

  • The APIClient class from nylas, the Nylas Python SDK
  • os to Secure sensitive access tokens as environment variables.
  • datetime to create time representations.
from nylas import APIClient
import os
import datetime

Initialize Nylas

Now, initialize the Nylas API client object by passing the client ID and secret for your Nylas app, and an access token for an account that has been connected to Nylas. If you’re new to Nylas, sign up for a free developer account, and follow our 30 second guide to get your API keys. For the next example, we’ve stored these tokens as environment variables for better security, take a look at our guide on Python environment variables to learn more. For development purposes, you can also simply pass the credentials as strings to the APIClient object if you prefer.

CLIENT_ID = os.environ['CLIENT_ID']
CLIENT_SECRET = os.environ['CLIENT_SECRET']
ACCESS_TOKEN = os.environ['ACCESS_TOKEN']

nylas = APIClient(
    CLIENT_ID,
    CLIENT_SECRET,
    ACCESS_TOKEN,
)

For these examples, we’ll use filtering to return threads that involve a specific email address, so we’ll start by defining this address. The pythonic way to do this would be to accept the email address as an argument when the script is invoked, but for simplicity’s sake, we’ll define this as a global variable within our script.

email = "[email protected]"

Parse Threaded Email Conversations

Email messages usually don’t exist in a vacuum, they’re often a part of a conversation that includes more than one email exchange. Fortunately, the Nylas Email API automatically handles the process of combining messages into threads so you can easily identify emails that belong to the same conversation. The Threads endpoint exposes a message_ids parameter that lists the ids for all messages that are a part of the thread. The Nylas Python SDK represents these threads as dictionary objects:
{
    "subject": "Our Latest Nuclear Ornithopters",
    "snippet": "Let's setup a meeting to discuss the latest line of models.",
    "unread": false,
    "has_attachments": true,
    "id": "hadf0ancjaps8df43",
    "labels": [
        {
            "display_name": "Inbox",
            "id": "abmcj58j4ncxd03jh",
            "name": "inbox"
        }
    ],
    "participants": [
        {
            "email": "[email protected]",
            "name": "Leonardo Da Vinci"
        },
        {
            "email": "Albert Einstein",
            "name": "[email protected]"
        }
    ],
    "message_ids": [
        "9jh9hdf4bnz94lpa9g4"
        "73nsgq026492gduwp017e"
    ],
    ... # See other available attributes: https://docs.nylas.com/reference#threads
}

Next, let’s take a look at how we can use this info to parse data from all email messages in a single conversation. 

Parse Email Conversations by Email Address

The first function we need accepts the email address we just defined and returns a list of the most recent email conversations that have included this address. It’s a good idea to keep requests to REST APIs small, so this function will also set a default limit of the 10 most recent email threads. This is part of a the Nylas Email API pagination feature, which allows you to select chronological groupings of email threads.

def get_sender_history(email, limit=10):
    return nylas.threads.where(from_=email, limit=limit).all()

Return All Email Messages in a Thread

Now that we have a list of threads to work with, we need a function that accepts a single thread, and returns all messages from it.  The next function uses the Messages endpoint to return a JSON representation of each individual message.

def get_messages(thread):
    messages = []
    # Reverse the messages to start with the most recent message first
    thread["message_ids"].reverse()
    for message_id in thread["message_ids"]:
        messages.append(nylas.messages.get(message_id))
    return messages

Messages contain all the data you’d expect from an email message, including the subject, body, participants, labels, folders, and more. Here’s an example of an email message returned by the Nylas Email API:

{
    "subject": "Our Latest Nuclear Ornithopters",
    "body": "Let's setup a meeting to discuss the latest line of models. See the attached product brochure for details",
    "labels": [
        {
          "display_name": "Inbox",
          "id": "abmcj58j4ncxd03jh",
          "name": "inbox"
      }
    ]
    "from": [
        {
            "email": "[email protected]",
            "name": "Leonardo Da Vinci"
        }
    ],
    "to": [
        {
            "email": "Albert Einstein",
            "name": "[email protected]"
        }
    ],
    "files": [
        "jg8sjsfdbdv98t930234kj"
    ]
    "id": "9jh9hdf4bnz94lpa9g4",
    "starred": false,
    "unread": false,
    "thread_id": "hadf0ancjaps8df43"
}

Parse Attachments From Email Messages

If you want to ingest email inbox data into your app, there is a good chance that you want to access file attachments. The Nylas Email API Files endpoint provides full access to detect and download file attachments from email messages. The next function accepts an email address and uses the Threads endpoint to find the most recent conversation that includes a file attachment. It then downloads all attachments and saves them locally.
def get_recent_attachment(email):
    threads = get_sender_history(email, limit=100)
    for thread in threads:
        # has_attachments is a boolean to indicate whether the thread has any attachments
        if thread["has_attachments"]: 
            messages = get_messages(thread)
            for message in messages:
                # The files attribute contains a list of attachment IDs
                if message["files"]:
                    for file_ in message["files"]:
                        file_object = nylas.files.get(file_['id'])
                        open(file_object.filename, 'wb').write(file_object.download())
                    return True
    print("no files found")
    return False

Analyze Sources of Unread Emails

Maybe you don’t need to parse data out of an email inbox, but rather want to help your user perform functionality based on meta information related to the email, such as unread or starred status or the labels that have been applied to the email. The Nylas Email API exposes all of these attributes so you can create workflows around them. 

The next function analyzes the 100 most recent unread emails and returns a list of dictionaries for each unique email address. This dictionary includes a count of the total number of unread messages from the user and a list of IDs for each of them. Your app can then use this list of IDs to parse further information for each of the emails.  

def get_unread_sources():
    from_emails = {}
    # Filter to only return unread emails, returns 100 by default
    unread_messages = nylas.messages.where(unread=True)
    for message in unread_messages:
        # Check the from attribute and keep a record for all unique emails
        from_email = message["from_"][0]["email"]
        if from_email not in from_emails:
            from_emails[from_email] = {
                    'email' : from_email,
                    'total_unread' : 1,
                    'unread_emails' : [message["id"]],
                    }
        else:
            from_emails[from_email]["total_unread"] += 1
            from_emails[from_email]["unread_emails"].append(message["id"])
    return from_emails

Here is an example of the output of this function:

[
  {
    "email": "[email protected]",
    "total_unread": 2,
    "unread_emails": [
      "sfb8w4yjwdsfbv82sdfg2",
      "sdfg435yhs37jhdfbhdsf",
      ... # continue for all messages
    ]
  },
  ... # Continue for all unique email addresses
]

Build Your Email Integration With Nylas

The Nylas Email API is the simplest way to build your email, calendar, and contacts integration, and this article has only scratched the surface of what’s possible with the Nylas Communications Platform. If you want to learn more, take a look at the Python SDK quickstart guide to learn how to add other functionality including drafting and sending emails, scheduling and managing calendar events, and managing user contacts. Or, head to the comments and let us know about your email integration!

Ben Lloyd Pearson

Ben was the Developer Advocate for Nylas. He is a triathlete, musician, avid gamer, and loves to seek out the best breakfast tacos in Austin, Texas.