Special Projects
CS598R Winter 2006: Special Projects - Text Classification and Text Clustering
Introduction
Dr. Ringger, Mark Gulbrandsen (Msg26), Daniel Walker, and Scott Chun are studying text classification and text clustering using Natural Language Processing techniques. The course will cover Naive Bayes, Expectation Maximization, Maximum Entropy, several feature selection topics, several Support Vector Machines topics, and several Clustering topics. The experimental part of this course is searching for a novel use of these concepts. The expected outcome is a paper submitted to a substantial conference in the NLP field.
Goals
Develop mastery of state-of-the-art text classification and text clustering techniques
Brainstorm on novel directions in text clustering
Develop resources for future course on the subject
Produce a publishable conference paper
Class Meetings
We meet the following days/times in the NLP South Lab.
Monday at 2:00-3:00 pm – Reading Discussion, produce draft presentation, planning, brainstorming
Wednesday at 3:30-4:30 pm – Further Topic Discussion and Coding Preparation
Friday at 3:30-4:30 pm – Extreme programming, experimental results
Schedule
The file Schedule.xls (a Microsoft Excel formatted file) outlines the schedule for this class.
Text and Readings
Deliverables
Spreadsheet of classification results
Spreadsheet of clustering results
Per-topic PowerPoint drafts
Paper on clustering, including survey and novel results
No exams
Data
Reuters newswire
20 Newsgroups
File system data
Enron email data-set
Grading
The grade for this course is calculated based on four performance areas:
Paper reading completion
Meeting attendance and participation
Coding sessions and assignments
Paper authoring, including creative efforts toward developing a novel application of the course contents.
Dr. Ringger will give individual feedback roughly monthly to each of the participants, so that they can gauge their performance.
Back to top