Special Projects

CS598R Winter 2006: Special Projects - Text Classification and Text Clustering

Introduction

Dr. Ringger, Mark Gulbrandsen (Msg26), Daniel Walker, and Scott Chun are studying text classification and text clustering using Natural Language Processing techniques. The course will cover Naive Bayes, Expectation Maximization, Maximum Entropy, several feature selection topics, several Support Vector Machines topics, and several Clustering topics. The experimental part of this course is searching for a novel use of these concepts. The expected outcome is a paper submitted to a substantial conference in the NLP field.

Goals

  • Develop mastery of state-of-the-art text classification and text clustering techniques
  • Brainstorm on novel directions in text clustering
  • Develop resources for future course on the subject
  • Produce a publishable conference paper

Class Meetings

We meet the following days/times in the NLP South Lab.

  • Monday at 2:00-3:00 pm – Reading Discussion, produce draft presentation, planning, brainstorming
  • Wednesday at 3:30-4:30 pm – Further Topic Discussion and Coding Preparation
  • Friday at 3:30-4:30 pm – Extreme programming, experimental results

Schedule

The file Schedule.xls (a Microsoft Excel formatted file) outlines the schedule for this class.

Text and Readings

Deliverables

  • Spreadsheet of classification results
  • Spreadsheet of clustering results
  • Per-topic PowerPoint drafts
  • Paper on clustering, including survey and novel results
  • No exams

Data

  • Reuters newswire
  • 20 Newsgroups
  • File system data
  • Enron email data-set

Grading

The grade for this course is calculated based on four performance areas:

  1. Paper reading completion
  2. Meeting attendance and participation
  3. Coding sessions and assignments
  4. Paper authoring, including creative efforts toward developing a novel application of the course contents.

Dr. Ringger will give individual feedback roughly monthly to each of the participants, so that they can gauge their performance.

nlp/cs598r-winter-2006-proposal.txt · Last modified: 2015/04/23 15:33 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0