Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cs-401r:20newsgroups [2014/09/05 17:34]
cs401rPML created from https://facwiki.cs.byu.edu/cs679/index.php/20_News_Groups
cs-401r:20newsgroups [2014/09/05 17:44] (current)
cs401rPML Updated download link.
Line 5: Line 5:
 By default, the data is tokenized as follows: all header information is discarded and the remaining text is split into tokens that correspond to contiguous sequences of alphabetical characters. ​ This conforms to the procedures followed by other researchers (TODO: fill in names here). ​ By following the same procedures, we hope to be able to reproduce their results as closely as possible. By default, the data is tokenized as follows: all header information is discarded and the remaining text is split into tokens that correspond to contiguous sequences of alphabetical characters. ​ This conforms to the procedures followed by other researchers (TODO: fill in names here). ​ By following the same procedures, we hope to be able to reproduce their results as closely as possible.
  
-The 20 Newsgroups data set can be downloaded [[http://​nlp.cs.byu.edu/​classes/​cs601r/​data/​newsgroups/​|here]]. ​ This split in this version has the blind test data removed.+The 20 Newsgroups data set can be downloaded [[http://​nlp.cs.byu.edu/​classes/​cs679/​data/​newsgroups/​|here]]. ​ This split in this version has the blind test data removed.
cs-401r/20newsgroups.txt ยท Last modified: 2014/09/05 17:44 by cs401rPML
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0