The Reuters data set consists of Reuters news articles from the 90s. This data set is located in a password protected directory on the NLP lab servers. The data set is unique in that it is the first with multiple splits provided out of the box. Hence, instead of using:

< path to reuters data set>/indices

as your split parameter, you will use either:

< path to reuters data set>/indices/reduced_set


< path to reuters data set>/indices/full_set

The reduced set is provided for convenience. It contains fewer documents and may be of help as you develop and debug your program. You may feel free to create other, even smaller, splits to help these tasks go faster still.

cs-401r/reuters.txt · Last modified: 2014/09/05 12:01
