Table of Contents

NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing

June 5, 2009, Boulder, Colorado, USA http://nlp.cs.byu.edu/alnlp/

Labeled data is a prerequisite for many popular algorithms in natural language processing and machine learning. While it is possible to obtain large amounts of annotated data for well-studied languages in well-studied domains and well-studied problems, labeled data are rarely available for less common languages, domains, or problems. Unfortunately, obtaining human annotations for linguistic data is labor-intensive and typically the costliest part of the acquisition of an annotated corpus.

It has been shown before that active learning can be employed to reduce annotation costs but not at the expense of quality. While diverse work over the past decade has demonstrated the possible advantages of active learning for corpus annotation and NLP applications, active learning is not widely used in many ongoing data annotation tasks. Much of the machine learning literature on the topic has focused on active learning for classification problems with less attention devoted to the kinds of problems encountered in NLP. This workshop attempts to bring together researchers interested in active learning for NLP.

The original call for papers is archived here.

Proceedings

Proceedings of the Workshop on the ACL Anthology

Program

8:30 Welcome: Eric Ringger
9:00 Invited talk: Active Learning for NLP: Past, Present, and Future
Burr Settles, University of Wisconsin
Session 1: Anaphora Resolution
10:00 Active Learning for Anaphora Resolution
Caroline Gasperin
10:30 Break
Session 2: Multiple Annotators and Cost Considerations
11:00 On Proper Unit Selection in Active Learning: Co-Selection Effects for Named Entity Recognition
Katrin Tomanek, Florian Laws, Udo Hahn and Hinrich Schütze
11:30 Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment
Shilpa Arora, Eric H. Nyberg and Carolyn P. Rose
12:00 Data Quality from Crowd-sourcing: A Study of Annotation Selection Criteria for Sentiment Analysis
Pei-Yun Hsueh, Prem Melville and Vikas Sindhawni
12:30 Lunch
Session 3: Real Annotators and Experts
2:00 Evaluating Automation Strategies in Language Documentation
Alexis Palmer, Jason Baldridge and Taesun Moon
2:30 A Web Survey on the Use of Active Learning to support Annotation of Text Data
Katrin Tomanek and Fredrik Olsson
3:00 Invited talk: Return on Investment for Active Learning
Robbie Haertel, Brigham Young University
3:30 Break
Session 4: New Methods
4:00 Active Dual Supervision: Reducing the Cost of Annotating Examples and Features
Prem Melville and Vikas Sindhwani
4:30 Proactive Learning for Building Machine Translation Systems for Minority Languages
Vamshi Ambati and Jaime Carbonell
5:00 Discussion
5:30 End of Workshop

Endorsed by the following ACL Special Interest Groups

Organizers and Contact

Please address any queries regarding the workshop to: al.nlp2009@googlemail.com

Program Committee