Welcome to the '''Natural Language Processing Lab''' of the Brigham Young University Computer Science Department If you are looking for a private wiki where lab members can coordinate on unbaked projects, please use the [[nlp-private:start|Private NLPWiki]] ---- == Overview == Members of the Natural Language Processing lab are working on text mining problems involving the discovery of structure and patterns in large collections of documents with little or no human intervention. Projects include a topic browser based on hierarchical Bayesian topic models, error reduction in OCR of historical documents, and topic models for noisy data. We are also working on learning to annotate lesser studied languages to aid scholarship on documents written in those languages. Approaches to solving this problem include probabilistic models of language structure and cost-conscious active learning methods. In particular, we are using these methods to facilitate the annotation of ancient documents written in Syriac, a dying Semitic language in which many significant documents of the Christian Near East were written. We are also interested in learning new and difficult tasks from both data and expert knowledge in harmonious ways using active learning, feature engineering, Bayesian models, and methods of advice-giving. [[media:nlp:600px-NLPLab2009.jpg]] == News == * [[Entropy|Entropy, the original NLP Lab fileserver]], has been retired. Farewell, keeper of bits. * Project overview pages have been created for [[Machine-Assisted Annotation]], [[Historical Document Recognition]], and [[Text Mining]]. * Introducing the [[Topical Guide]]! [[More News ...]] == Projects == * [[machine-assisted-annotation|Active Learning for Annotation]]: ** CCASH: Cost-Conscious Annotation Supervised by Humans ** [[On-ramp|ALFA short course: on-ramp into the project]] * [[Projects:Syriac|Syriac Corpus]]: Syriac morphological analysis using active learning for the construction of a labeled corpus of classical Syriac texts. ** Data-driven diacritization ** Data-driven morphological analysis * Text Mining: ** Document clustering and Cluster evaluation ** Topic modeling ** [[Topical Guide|Topic model visualization]] * Processing Noisy OCR Data ** Reducing error rates in Optical Character Recognition ** Recognizing names in noisy OCR data ** Topic modeling on noisy OCR data === Others === * [[Language_Identification|Spoken Language Identification]] * [[Paraphrase|Sentential paraphrase]] * [[MayaWiki|Robbie Haertel's MayaWiki]] * [[PSST|Pedagogical Software and Speech Technologies (PSST)]] == Technical Reports == * [http://nlp.cs.byu.edu/techreports/TR2/BYUNLP-TR2.pdf BYU NLP Lab Tech Report #2] = "Generating Paraphrases with Greater Syntactic Variation using Syntactic Phrases" * [http://nlp.cs.byu.edu/techreports/TR1/BYUNLP-TR1.pdf BYU NLP Lab Tech Report #1] = "Improving Classification in Phone-Based Language Recognition with Maximum Entropy Models" == Courses == * CS 679: [http://facwiki.cs.byu.edu/cs679/ Fall 2012 course on Text Mining] * CS 479: [http://facwiki.cs.byu.edu/cs479/ Fall 2012 course on Natural Language Processing] [[Older Courses ...]] == People == === Faculty === * [http://faculty.cs.byu.edu/~ringger/ Eric Ringger], Director, NLP, Machine Learning * [http://linguistics.byu.edu/lonsdaled.php Deryle Lonsdale], Linguistics * [http://faculty.cs.byu.edu/~kseppi/ Kevin Seppi], Machine Learning and Optimization === Students === ==== PhD ==== * [http://www.rhaertel.me Robbie Haertel ] * [http://nlp.cs.byu.edu/~dan Dan Walker ] * [http://www.billlund.com Bill Lund ] * Paul Felt ==== MS ==== * Kevin Cook * [http://joshhansen.net/ Josh Hansen] * Hito Matsushita * Kevin Black [[Alumni]] == Contact == * 3346 TMCB; Computer Science Department; Brigham Young University; Provo, Utah 84602 * [http://cs.byu.edu/department_info/index.php Map] * Phone: 801-422-7615 == Resources == * [[Lab Meeting]]: weekly lab meeting. **Paper Reading list idea : http://spreadsheets.google.com/ccc?key=r8QAzUzlc7lMm9m66Uj_XTg * [http://mail.cs.byu.edu/mailman/listinfo/nlp/ NLP Mailing List] [https://mail.cs.byu.edu/mailman/private/nlp/ (Archive)] * [http://www.provo.org/ Provo, Utah] [http://www.byu.edu/ BYU] [http://www.cs.byu.edu/ Computer Science Department] * [http://nlp.cs.byu.edu/subversion/ Subversion] * [http://mail.cs.byu.edu/mailman/listinfo/nlp-svn/ NLP Lab Subversion Commit List] * [http://nlp.cs.byu.edu/trac/ Trac] * [http://mail.cs.byu.edu/mailman/listinfo/nlp-labadmin/ NLP Lab Administration List] * [http://www.cs.rochester.edu/~tetreaul/conferences.html Upcoming NLP conferences and deadlines] [[Older resources]] === Data === * [[Data | List of datasets available on the NLP Lab server]] * [http://linguistics.byu.edu/corpora.php List of corpora available through Linguistics] * [[Synthetic OCR Data | Synthetic OCR dataset produced by the lab]]