This shows you the differences between two versions of the page.
— |
cs-401r:reuters [2014/09/05 12:01] (current) cs401rPML created from https://facwiki.cs.byu.edu/cs679/index.php/Reuters |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | The Reuters data set consists of Reuters news articles from the 90s. This data set is located in [[http://nlp.cs.byu.edu/classes/cs679/data/reuters|a password protected directory]] on the NLP lab servers. The data set is unique in that it is the first with multiple splits provided out of the box. Hence, instead of using: | ||
+ | < path to reuters data set>/indices | ||
+ | |||
+ | as your split parameter, you will use either: | ||
+ | |||
+ | < path to reuters data set>/indices/reduced_set | ||
+ | |||
+ | or: | ||
+ | |||
+ | < path to reuters data set>/indices/full_set | ||
+ | |||
+ | The reduced set is provided for convenience. It contains fewer documents and may be of help as you develop and debug your program. You may feel free to create other, even smaller, splits to help these tasks go faster still. |