Discover how to reclaim control over your emails. Let's leverage Python to efficiently clean up and organize your Gmail inbox.
Joey Miller • Posted July 12, 2023
Emails are an important part of many of our lives - both personally and professionally. Staying on top of your inbox can be a daunting task. My matter how hard I try, inevitably my Gmail begins overflowing with countless unread messages.
In this guide we will explore how Python can be utilized to effortlessly sort through your inbox, allowing you to regain control.
Note: The purpose of this post isn't to detail a fully-automated AI that can clean our inboxes unsupervised. Rather, the goal is to introduce you to the tools needed to supplement your efforts when cleaning your inbox.
Ensure you have python3
and pip
installed.
I encourage you to install the dependencies into a virtual environment.
Navigate to your project directory and run the following:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib pandas
Before we can get started, we need to register our application with Google so we can access user data.
We will follow the official instructions to create an OAuth "Desktop app".
OAuth client created
popover appears, showing the client details. Click 'Download JSON' and save the file as credentials.json
to your project directory.In this simple example, we will focus on creating a Python script that gives a breakdown of the most common senders in our inbox.
Create a Python file called gmail_organizer.py
in your project directory.
First, let's add the shebang and imports.
#!/usr/bin/env python3
from __future__ import print_function
import os.path
import pandas as pd
import re
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
Then let's create the authentication function. This uses the credentials.json
file to allow us to authenticate on behalf of a user. Once a user has authenticated a token.json
will be created in the project directory. This matches the sample code provided by Google.
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def get_creds():
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
return creds
Now, we create the functions get_inbox_emails(..)
and process_email_metadata(..)
- these will be doing most of the heavy lifting.
email_metadata = []
def process_email_metadata(request_id, response, exception):
global email_metadata
message_id = response.get('id')
headers = response.get('payload').get('headers');
if(headers is not None):
for header in headers:
if header['name'] == "From":
username, domain = re.match(
r'(?:.*<)?(.*)@(.*?)(?:>.*|$)', header['value']
).groups()
email_metadata.append({
'message_id':message_id,
'username':username,
'domain':domain})
break
def get_inbox_emails(service):
# Call the Gmail API
response = service.users().messages().list(
userId='me',
labelIds=['INBOX'],
maxResults=5000
).execute()
# Retrieve all message ids
messages = []
messages.extend(response['messages'])
while 'nextPageToken' in response:
page_token = response['nextPageToken']
response = service.users().messages().list(
userId='me',
labelIds=['INBOX'],
maxResults=5000,
pageToken=page_token
).execute()
messages.extend(response['messages'])
# Retrieve the metadata for all messages
step = 100
num_messages = len(messages)
for batch in range(0, num_messages, step):
batch_req = service.new_batch_http_request(callback=process_email_metadata)
for i in range(batch, min(batch + step, num_messages)):
batch_req.add(service.users().messages().get(
userId='me',
id=messages[i]['id'],
format="metadata")
)
batch_req.execute()
Let's break down what these functions accomplish:
gmail
service class.nextPageToken
in the response.id
's we found in Step 2. For performance; in each request we ask Gmail to return the metadata of up to 100 emails.process_email_metadata(..)
is called. This is where we process our data. In this example, I process the From:
field and apply some regex to extract the email username and domain name. This will allow us to find the most common senders in my inbox.Now finally let's create the script entrypoint (calling the functions we've already made above).
def main():
creds = get_creds()
service = build('gmail', 'v1', credentials=creds)
get_inbox_emails(service)
if __name__ == '__main__':
main()
Running the code above will return nothing. We need to process the data and display it to the user. We can use Pandas to easily report a descending list of email usernames and domains.
We've already done the work to process this data in process_email_metadata(..)
, so all we need to do is add the following lines to main()
below get_inbox_emails(service)
:
# Print the results
df = pd.DataFrame(email_metadata)
print("Most common email usernames -----------")
print(df.groupby('username')
.size().reset_index(name='count')
.sort_values(by='count',ascending=False)
.to_string(index=False))
print()
print("Most common email domains -------------")
print(df.groupby('domain')
.size().reset_index(name='count')
.sort_values(by='count',ascending=False)
.to_string(index=False))
See the full complete script on Github.
From the project directory:
python3 gmail_organizer.py
A new browser window will open prompting you to sign in to your Google account. The script will analyze the emails in the Gmail account associated with the Google account you sign in with at this point. The browser window will warn you that this is unsafe, but that is only because your application is unverified. If necessary, you can go through the process to verify your application.
After running the application, you should get an output similar to the following:
Most common email usernames -----------
username count
info 6
noreply 5
no-reply 2
donotreply 1
...
Most common email domains -------------
domain count
example.com 5
youtube.com 2
change.org 1
...
The example above is a very simple example of what you can accomplish. It serves as a scaffold that you can expand to tackle more complex situations. It is possible to extend the script to modify your inbox, including labeling or deleting emails.
Start by making sure you have the correct SCOPES
for the operations you are attempting. Google outlines the different scopes here.
To be able to additionally label emails, we need the modify
scope. This means we need to update:
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
to:
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly',
'https://www.googleapis.com/auth/gmail.modify']
Make sure you delete token.json
after changing the scopes.
From here, labeling/starring an email is very straightforward.
def label_emails(service, message_id):
response = service.users().messages().modify(
userId='me',
id=message_id,
body={
"addLabelIds":['STARRED']
}
).execute()
Note: For labeling large numbers of emails, consider using batchModify instead (for the same reasons we did for retrieving metadata earlier).
We've already done the work to process this data in process_email_metadata(..)
, so to star all emails from example.com
all we need to do is add the following lines to main()
below get_inbox_emails(service)
:
for email in email_metadata:
if(email['domain'] == 'example.com'):
label_emails(service, email['message_id'])
Tags
If you found this post helpful, please share it around: