The first thing to note about the enron data is that it is in maildir format. Please see http://en.wikipedia.org/wiki/Maildir for details. In summary, each account contains custom folders. Each folder may contain a set of files that store an email. That is, each email is stored in its own file.

The data set is located on entropy at /home/data/enron.

Here is a paper describing the data set http://www.ceas.cc/papers-2004/168.pdf. Here is another paper http://nyc.lti.cs.cmu.edu/yiming/Publications/klimt-ecml04.pdf.

There 619,446 email messages among 158 users (from the above paper). The folks at CMU have cleaned up the data set, so the set that we have (since it came from CMU) has 200,399 messages among 158 users. Take a look at the paper for the finer details.

