Mail Trends #

I get a lot of email (especially at work). I'm trying a Inbox Zero-like approach in order to keep up with it. Though that's helping me to stay on top of things, I had the nagging feeling that I was probably on too many mailing lists, and that some of them were probably not worth it from a signal-to-noise ratio perspective.

Ideally something like the Reader Trends or Search History Trends page would exist for Gmail. I thought I could perhaps build it myself, but the absence of an official Gmail API deterred me. However, it occurred to me that the recently added IMAP support could act as an API of sorts. It should be easy to get just the message headers and slice and dice them to extract the stats that I was interested in.

Thus was born Mail Trends, an IMAP-based email analysis project. It can generate a bunch of tables, graphs and distributions based on time of day, senders, recipients, mailing lists, etc. To get a feel for what it can output, see the results of running it on a piece of the Enron Email Dataset. To run it over your own email, see the getting started page. As a caveat, the program currently loads everything into memory, so my run on 200,000 messages resulted in 1.6 gigabytes being used. You may want to use the --max_messages= flag to limit the dataset, at least for initial runs.

The project is still in its early stages, so patches and suggestions are definitely welcome (my email address is at the footer). You can also subscribe to the feed of check-ins to see changes as they are made. The plan wiki page has a very brief outline of what I'm planning on working on next.

21 Comments

You are the man.
awesome... xobni for gmail! you rock!
doesn't work for me... I get a strange ImportError. Anyway I think it's a cheetah's bug.
Very nice! I can't wait to get my email into this tool.
Wow, very nice work! I'm sad I don't know python.
Any chance of getting this to run on an mbox file???

That would be totally sweet!
> doesn't work for me... I get a strange ImportError. Anyway I think it's a cheetah's bug.

I also got an ImportError, but all I had to do was copy the util.py file from the templates folder to the trunk to make it find it.
Ivan and Caleb, thanks for pointing out the ImportError issue. I've modified the setup to:

1) not include Cheetah (you have to download it separately) since my pre-built version was causing trouble for some people.
2) fix the import for util.py.
@Google - Please buy or do this!
Nice I have been thinking about this for some time.

I would like to see info on how often I, and the reverse, how often you respond to emails. (Helpful for people on/with email lists etc)

I wish Google would do more with viewing of content, for example options to view in mindmaps, like they do at I feel Fine.

It would be cool to take the info you have from this and the time you spend on this and insert into a calendar.

I think this could be extremely powerful for cell phone usage as well.

Take [email+calender+cell phone= a hell of a lot of info on what why and how you do life] and this could be the begning of a super sweet CRM/Social Networking tool.

Could you imagine having statistical info on all your communications and easily available.
Were there really no threads of length 2 in the enron dataset, or is that a bug? The former seems highly unlikely.
I have threads of length 2 in other datasets I've run it on, so it might be a weird coincidence. Generally, the Enron dataset isn't that great, since it's all executives (who don't use email as much) and they only seem to include personal emails (almost no mailing lists). It also looks like Enron had pretty strict retention policies, a lot of people seemed to only keep mail they sent, and not much that they received.
Use Splunk (www.splunk.com). 1) It's free. 2) You could do this whole thing with 1-2 lines of configuration, and produce lots of useful graphs. 3) there's no memory limit issues.
What is the typical size of an e-mail message header? Imagining that my GMail account has 1GB of messages, is it possible to estimate the total size of the headers that will be downloaded by Mail Trends?

Great tool!
looks great - unfortunately for me I don't have any python/coding experience so I tried to follow all the steps and haven't been able to get very far yet. I am familiar with working from the command prompt so I thought I might be able to fumble my way through but no luck yet.
I have downloaded and installed Python and I think I managed to import Cheetah into the lib (ran "setup.py install" in the Cheetah directory)

I'm still not clear on how to run main.py (not sure how I include all of the server & login info when running it from the command prompt)

Will keep trying but if anyone has any pointers I'd be really thankful.
Just plain awesome. Nice work, Mihai.
This is wonderful - it'd be great if it could work on generic IMAP accounts though!
Feature request: similar to --me=, would it be hard to have a --lists= that would allow you to specify list emails for lists missing a list-id? My huge mailing list is missing one, and it would be great to be able to move it from Recipients to Lists.

I'm having too much fun with this. Did you think about the idea of using a relational database to lower memory requirements?
looks nice :-)
I'll try it as soon as possible !
And yes ! it would be nice to run it on a mbox file :D
i know nothing of python - and yet i consider myself to be somewhat of a geek.

it would be supreme if someone could "translate" this for people like me.

i am so utterly in need of / anxious to run this over my account!

Post a Comment