Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp:enron [2015/04/23 21:46] (current)
ryancha created
Line 1: Line 1:
 +===Enron Data Information===
 +The first thing to note about the enron data is that it is in maildir format. Please see http://​en.wikipedia.org/​wiki/​Maildir for details. In summary, each account contains custom folders. Each folder may contain a set of files that store an email. That is, each email is stored in its own file.
  
 +The data set is located on entropy at /​home/​data/​enron.
 +
 +Here is a paper describing the data set http://​www.ceas.cc/​papers-2004/​168.pdf.
 +Here is another paper http://​nyc.lti.cs.cmu.edu/​yiming/​Publications/​klimt-ecml04.pdf.
 +
 +There 619,446 email messages among 158 users (from the above paper). The folks at CMU have cleaned up the data set, so the set that we have (since it came from CMU) has 200,399 messages among 158 users. Take a look at the paper for the finer details.
nlp/enron.txt ยท Last modified: 2015/04/23 21:46 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0