nlp:cs598r-winter-2006-proposal [CS Wiki]

CS598R Winter 2006: Special Projects - Text Classification and Text Clustering

Introduction

Dr. Ringger, Mark Gulbrandsen (Msg26), Daniel Walker, and Scott Chun are studying text classification and text clustering using Natural Language Processing techniques. The course will cover Naive Bayes, Expectation Maximization, Maximum Entropy, several feature selection topics, several Support Vector Machines topics, and several Clustering topics. The experimental part of this course is searching for a novel use of these concepts. The expected outcome is a paper submitted to a substantial conference in the NLP field.

Goals

Develop mastery of state-of-the-art text classification and text clustering techniques
Brainstorm on novel directions in text clustering
Develop resources for future course on the subject
Produce a publishable conference paper

Class Meetings

We meet the following days/times in the NLP South Lab.

Monday at 2:00-3:00 pm – Reading Discussion, produce draft presentation, planning, brainstorming
Wednesday at 3:30-4:30 pm – Further Topic Discussion and Coding Preparation
Friday at 3:30-4:30 pm – Extreme programming, experimental results

Schedule

The file Schedule.xls (a Microsoft Excel formatted file) outlines the schedule for this class.

Text and Readings

://www.amazon.com/gp/product/0262133601/qid=1136785005/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/103-0595473-0507838?n=507846&s=books&v=glance Foundations of Statistical Natural Language Processing (Hardcover) by Christopher D. Manning, Hinrich Schütze
Other Papers as noted in Schedule.xls

Deliverables

Spreadsheet of classification results
Spreadsheet of clustering results
Per-topic PowerPoint drafts
Paper on clustering, including survey and novel results
No exams

Data

Reuters newswire
20 Newsgroups
File system data
Enron email data-set

Grading

The grade for this course is calculated based on four performance areas:

Paper reading completion
Meeting attendance and participation
Coding sessions and assignments
Paper authoring, including creative efforts toward developing a novel application of the course contents.

Dr. Ringger will give individual feedback roughly monthly to each of the participants, so that they can gauge their performance.

nlp/cs598r-winter-2006-proposal.txt · Last modified: 2015/04/23 15:33 by ryancha

Back to top

Table of Contents