NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing June 5, 2009, Boulder, Colorado, USA http://nlp.cs.byu.edu/alnlp/ Labeled data is a prerequisite for many popular algorithms in natural language processing and machine learning. While it is possible to obtain large amounts of annotated data for well-studied languages in well-studied domains and well-studied problems, labeled data are rarely available for less common languages, domains, or problems. Unfortunately, obtaining human annotations for linguistic data is labor-intensive and typically the costliest part of the acquisition of an annotated corpus. It has been shown before that active learning can be employed to reduce annotation costs but not at the expense of quality. While diverse work over the past decade has demonstrated the possible advantages of active learning for corpus annotation and NLP applications, active learning is not widely used in many ongoing data annotation tasks. Much of the machine learning literature on the topic has focused on active learning for classification problems with less attention devoted to the kinds of problems encountered in NLP. This workshop attempts to bring together researchers interested in active learning for NLP. '''The original call for papers is [[Workshop on Active Learning for NLP (CFP)|archived here]].''' == Proceedings == [http://aclweb.org/anthology-new/signll.html#2009-2 Proceedings of the Workshop on the ACL Anthology] == Program == {| class="wikitable" |- |8:30 |Welcome: Eric Ringger |- |9:00 |'''Invited talk''': ''Active Learning for NLP: Past, Present, and Future'' |- | |Burr Settles, University of Wisconsin |- | |'''Session 1: Anaphora Resolution''' |- |10:00 |''Active Learning for Anaphora Resolution'' |- | |Caroline Gasperin |- |10:30 |Break |- | |'''Session 2: Multiple Annotators and Cost Considerations''' |- |11:00 |''On Proper Unit Selection in Active Learning: Co-Selection Effects for Named Entity Recognition'' |- | |Katrin Tomanek, Florian Laws, Udo Hahn and Hinrich Schütze |- |11:30 |''Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment'' |- | |Shilpa Arora, Eric H. Nyberg and Carolyn P. Rose |- |12:00 |''Data Quality from Crowd-sourcing: A Study of Annotation Selection Criteria for Sentiment Analysis'' |- | |Pei-Yun Hsueh, Prem Melville and Vikas Sindhawni |- |12:30 |Lunch |- | |'''Session 3: Real Annotators and Experts''' |- |2:00 |''Evaluating Automation Strategies in Language Documentation'' |- | |Alexis Palmer, Jason Baldridge and Taesun Moon |- |2:30 |''A Web Survey on the Use of Active Learning to support Annotation of Text Data'' |- | |Katrin Tomanek and Fredrik Olsson |- |3:00 |'''Invited talk''': ''Return on Investment for Active Learning'' |- | |Robbie Haertel, Brigham Young University |- |3:30 |Break |- | |'''Session 4: New Methods''' |- |4:00 |''Active Dual Supervision: Reducing the Cost of Annotating Examples and Features'' |- | |Prem Melville and Vikas Sindhwani |- |4:30 |''Proactive Learning for Building Machine Translation Systems for Minority Languages'' |- | |Vamshi Ambati and Jaime Carbonell |- |5:00 |Discussion |- |5:30 |End of Workshop |} == Endorsed by the following ACL Special Interest Groups == * SIGNLL, Special Interest Group for Natural Language Learning * SIGANN, Special Interest Group for Annotation == Organizers and Contact == * Eric Ringger, Brigham Young University, USA * Robbie Haertel, Brigham Young University, USA * Katrin Tomanek, University of Jena, Germany Please address any queries regarding the workshop to: [[al.nlp2009@googlemail.com]] == Program Committee == * Shlomo Argamon (Illinois Institute of Technology, USA) * Jason Baldridge (University of Texas at Austin, USA) * Markus Becker (SPSS, UK) * Ken Church (Microsoft Research, USA) * Hal Daume (University of Utah, USA) * Robbie Haertel (Brigham Young University, USA) * Ben Hachey (University of Edinburgh, UK) * Udo Hahn (University of Jena, Germany) * Eric Horvitz (Microsoft Research, USA) * Rebecca Hwa (University of Pittsburgh, USA) * Ashish Kapoor (Microsoft Research, USA) * Mark Liberman (University of Pennsylvania/LDC, USA) * Prem Melville (IBM T.J. Watson Research Center, USA) * Ray Mooney (University of Texas at Austin, USA) * Miles Osborne (University of Edinburgh, UK) * Eric Ringger (Brigham Young University, USA) * Kevin Seppi (Brigham Young University, USA) * Burr Settles (University of Wisconsin, USA) * Victor Sheng (New York University, USA) * Katrin Tomanek (University of Jena, Germany) * Jingbo Zhu (Northeastern University, China)