[[Category:Special Projects]] ==CS598R Winter 2006: Special Projects - Text Classification and Text Clustering== ===Introduction=== Dr. Ringger, Mark Gulbrandsen ([[Msg26]]), Daniel Walker, and Scott Chun are studying text classification and text clustering using Natural Language Processing techniques. The course will cover Naive Bayes, Expectation Maximization, Maximum Entropy, several feature selection topics, several Support Vector Machines topics, and several Clustering topics. The experimental part of this course is searching for a novel use of these concepts. The expected outcome is a paper submitted to a substantial conference in the NLP field. ===Goals=== * Develop mastery of state-of-the-art text classification and text clustering techniques * Brainstorm on novel directions in text clustering * Develop resources for future course on the subject * Produce a publishable conference paper ===Class Meetings=== We meet the following days/times in the NLP South Lab. * Monday at 2:00-3:00 pm -- Reading Discussion, produce draft presentation, planning, brainstorming * Wednesday at 3:30-4:30 pm -- Further Topic Discussion and Coding Preparation * Friday at 3:30-4:30 pm -- Extreme programming, experimental results ===Schedule=== The file [[media:Schedule.xls]] (a Microsoft Excel formatted file) outlines the schedule for this class. ===Text and Readings=== * [http://www.amazon.com/gp/product/0262133601/qid=1136785005/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/103-0595473-0507838?n=507846&s=books&v=glance] Foundations of Statistical Natural Language Processing (Hardcover) by Christopher D. Manning, Hinrich Schütze * Other Papers as noted in [[media:Schedule.xls]] ===Deliverables=== * Spreadsheet of classification results * Spreadsheet of clustering results * Per-topic PowerPoint drafts * Paper on clustering, including survey and novel results * No exams ===Data=== * Reuters newswire * 20 Newsgroups * File system data * Enron email data-set ===Grading=== The grade for this course is calculated based on four performance areas: # Paper reading completion # Meeting attendance and participation # Coding sessions and assignments # Paper authoring, including creative efforts toward developing a novel application of the course contents. Dr. Ringger will give individual feedback roughly monthly to each of the participants, so that they can gauge their performance.