Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
nlp:machine-assisted-annotation [2015/05/21 21:48]
plf1
nlp:machine-assisted-annotation [2015/05/21 23:09] (current)
plf1
Line 9: Line 9:
 == Publications == == Publications ==
  
-Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models + 
-Paul Felt, Eric Ringger, Kevin Seppi, Robbie Haertel +Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models ​  ^^  
-'''​To appear in NAACL 2015'''​ +|  | Paul Felt, Eric Ringger, Kevin Seppi, Kevin Black, Robbie Haertel ​  |  
-Crowdsourcing models aggregate multiple fallible human judgments. Previous work largely takes a discriminative modeling approach. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting.  ​+| :::                             ​| ​'''​To appear in NAACL 2015''' ​                                       ​| ​ 
 +| :::                             ​| ​Crowdsourcing models aggregate multiple fallible human judgments. Previous work largely takes a discriminative modeling approach. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting.  ​
  
 ^ [http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​1153_Paper.pdf| MOMRESP: A Bayesian Model for Multi-Annotator Document Labeling] ​  ​^^ ​ ^ [http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​1153_Paper.pdf| MOMRESP: A Bayesian Model for Multi-Annotator Document Labeling] ​  ​^^ ​
Line 19: Line 20:
 | :::                             | We introduce MOMRESP, a model that improves upon item response models to incorporate information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels for the document classification task. We implement this model and show that MOMRESP can use unlabeled data to improve estimates of the ground-truth labels over a majority vote baseline dramatically in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. |  | :::                             | We introduce MOMRESP, a model that improves upon item response models to incorporate information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels for the document classification task. We implement this model and show that MOMRESP can use unlabeled data to improve estimates of the ground-truth labels over a majority vote baseline dramatically in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. | 
  
 +^ [http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​1203_Paper.pdf| Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage] ​  ​^^ ​
 +| [[media:​nlp:​150px-lemmatization.png]] | Kevin Black, Eric Ringger, Paul Felt, Kevin Seppi, Kristian Heal, Deryle Lonsdale ​  ​| ​
 +| :::                             | '''​ LREC 2014 ''' ​                                       | 
 +| :::                             | In this work we adapt the discriminative string transducer DirecTL+ to perform lemmatization for classical Syriac, a low-resource language. We compare the accuracy of DirecTL+ with the Morfette discriminative lemmatizer. DirecTL+ achieves 96.92% overall accuracy, an improvement of 0.86% over Morfette but at the cost of a longer time to train the model. Error analysis on the models provides guidance on how to apply these models in a machine assistance setting for corpus-dictionary linkage. | 
  
  
-[http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​1203_Paper.pdf| Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage] +[http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​147_Paper.pdf| Using Transfer Learning to Assist Exploratory Corpus Annotation] ​  ^^  
-[[media:​nlp:​150px-lemmatization.png|left|150px|border]] +[[media:​nlp:​150px-eca.png]] ​Paul Felt, Eric Ringger, Kevin Seppi, Kristian Heal   |  
-* Kevin Black, Eric Ringger, Paul Felt, Kevin Seppi, Kristian Heal, Deryle Lonsdale  +| :::                             ​| ​'''​ LREC 2014 ''' ​                                       ​| ​ 
-* '''​LREC 2014'''​ +| :::                             ​| ​We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation. | 
-* In this work we adapt the discriminative string transducer DirecTL+ to perform lemmatization for classical Syriac, a low-resource language. We compare the accuracy of DirecTL+ with the Morfette discriminative lemmatizer. DirecTL+ achieves 96.92% overall accuracy, an improvement of 0.86% over Morfette but at the cost of a longer time to train the model. Error analysis on the models provides guidance on how to apply these models in a machine assistance setting for corpus-dictionary linkage. +
- +
- +
-[http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​147_Paper.pdf| Using Transfer Learning to Assist Exploratory Corpus Annotation] +
-[[media:​nlp:​150px-eca.png|left|150px|border]] +
-Paul Felt, Eric Ringger, Kevin Seppi, Kristian Heal  +
-'''​LREC 2014'''​ +
-We describe an under-studied problem in language resource management: that of providing automatic assistance to annotators working in exploratory settings. When no satisfactory tagset already exists, such as in under-resourced or undocumented languages, it must be developed iteratively while annotating data. This process naturally gives rise to a sequence of datasets, each annotated differently. We argue that this problem is best regarded as a transfer learning problem with multiple source tasks. Using part-of-speech tagging data with simulated exploratory tagsets, we demonstrate that even simple transfer learning techniques can significantly improve the quality of pre-annotations in an exploratory annotation. +
- +
- +
-[http://​nlp.cs.byu.edu/​public/​lre2013.pdfEvaluating machine-assisted annotation in under-resourced settings] +
-[[media:​nlp:​150px-evaluating-maa.png|left|150px|border]] +
-* Paul Felt, Eric Ringger, Kevin Seppi, Deryle Lonsdale, Kristian Heal, Robbie Haertel +
-* '''​LRE Journal, 2013'''​ +
-* Machine assistance is vital to managing the cost of corpus annotation projects. Identifying effective forms of machine assistance through principled evaluation is particularly important and challenging in under-resourced domains and highly heterogeneous corpora, as the quality of machine assistance varies. We perform a fine-grained evaluation of two machine-assistance techniques in the context of an under-resourced corpus annotation project. This evaluation requires a carefully controlled user study crafted to test a number of specific hypotheses. We show that human annotators performing morphological analysis of text in a Semitic language perform their task significantly more accurately and quickly when even mediocre pre-annotations are provided. When pre-annotations are at least 70% accurate, annotator speed and accuracy show statistically significant relative improvements of 25–35 and 5–7%, respectively. However, controlled user studies are too costly to be suitable for under-resourced corpus annotation projects. Thus, we also present an alternative analysis methodology that models the data as a combination of latent variables in a Bayesian framework. We show that modeling the effects of interesting confounding factors can generate useful insights. In particular, correction propagation appears to be most effective for our task when implemented with minimal user involvement. More importantly,​ by explicitly accounting for confounding variables, this approach has the potential to yield finegrained evaluations using data collected in a natural environment outside of costly controlled user studies.+
  
  
-[http://contentdm.lib.byu.edu/cdm/singleitem/​collection/​ETD/​id/​3267/​rec/​2Improving the Effectiveness of Machine-Assisted Annotation+[http://nlp.cs.byu.edu/public/lre2013.pdfEvaluating machine-assisted annotation in under-resourced settings  ^^  
-[[media:​nlp:​150px-machine-assisted-annotation.png|left|150px|border]] +[[media:​nlp:​150px-evaluating-maa.png]] ​Paul Felt, Eric Ringger, Kevin Seppi, Deryle Lonsdale, Kristian Heal, Robbie Haertel ​  ​| ​ 
-Paul Felt +| :::                             ​| ​''' ​LRE Journal, 2013 ''' ​                                       ​| ​ 
-* June 2012 +| :::                             | Machine assistance is vital to managing ​the cost of corpus ​annotation projects. Identifying effective forms of machine assistance through principled evaluation is particularly important ​and challenging in under-resourced domains and highly heterogeneous corpora, as the quality ​of machine assistance ​variesWe perform a fine-grained evaluation ​of two machine-assistance techniques ​in the context of an under-resourced corpus annotation projectThis evaluation requires ​carefully controlled user study crafted to test number ​of specific hypotheses. We show that human annotators ​performing ​morphological analysis ​of text in a Semitic language perform their task significantly more accurately and quickly when even mediocre ​pre-annotations are providedWhen pre-annotations are at least 70% accurate, annotator speed and accuracy show statistically significant relative improvements of 25–35 and 5–7%, respectivelyHowever, controlled user studies are too costly to be suitable for under-resourced corpus annotation projects. Thus, we also present an alternative analysis methodology that models the data as a combination of latent variables in a Bayesian framework. We show that modeling the effects of interesting confounding factors can generate useful insights. In particular, correction ​propagation ​appears to be most effective for our task when implemented with minimal user involvementMore importantly,​ by explicitly accounting for confounding variables, this approach has the potential to yield finegrained evaluations using data collected in a natural environment outside of costly controlled user studies. | 
-'''​Masters Thesis'''​.  Advised by Eric Ringger. +
-* This thesis contributes ​to the field of annotated ​corpus ​development by providing tools and methodologies for empirically evaluating ​the effectiveness ​of machine assistance ​techniquesThis allows developers ​of annotated corpora to improve annotator efficiency by choosing to employ only machine assistance techniques ​that make a measurable, positive differenceWe validate our tools and methodologies using concrete example. First we present CCASH, ​platform for machine-assisted online linguistic annotation capable ​of recording detailed annotator performance statistics. We employ CCASH to collect data detailing the performance of annotators ​engaged in syriac ​morphological analysis in the presence of two machine assistance techniques: ​pre-annotation and correction propagationWe  present a Bayesian analysis of the data that yields actionable insights into our data. Pre-annotation is shown to increase annotator accuracy when pre-annotations are at least 60\% accurate, ​and annotator speed when pre-annotations are at least 80\accurateCorrection ​propagation's effect on accuracy is minor.+
  
  
-[http://www.lrec-conf.org/proceedings/lrec2012/pdf/511_Paper.pdfFirst Results in a Study Evaluating Pre-labeling and Correction Propagation for Machine-Assisted ​Syriac Morphological Analysis+[http://contentdm.lib.byu.edu/cdm/singleitem/collection/ETD/​id/​3267/​rec/​2Improving the Effectiveness of Machine-Assisted ​Annotation  ^^  
-[[media:​nlp:​120px-boxplot.png|left|120px|border]] +[[media:​nlp:​150px-machine-assisted-annotation.png]] ​Paul Felt   |  
-Paul Felt, Eric K. Ringger, Kevin D. Seppi, Robbie Haertel, Kristian Heal, Deryle Lonsdale +| :::                             ​| ​''' ​June 2012 ''' ​                                       ​| ​ 
-'''​LREC 2012'''​ +| :::                             | '''​Masters Thesis'''​. ​ Advised by Eric Ringger. ​         |  
-* We investigate how good machine assistance ​needs to be in order to actually helping human annotators ​(in terms of time and cost).+| :::                             | This thesis contributes to the field of annotated corpus development by providing tools and methodologies for empirically evaluating the effectiveness of machine assistance ​techniques. This allows developers of annotated corpora ​to improve annotator efficiency by choosing ​to employ only machine assistance techniques that make a measurable, positive difference. We validate our tools and methodologies using a concrete example. First we present CCASH, a platform for machine-assisted online linguistic annotation capable of recording detailed annotator performance statistics. We employ CCASH to collect data detailing the performance of annotators ​engaged ​in syriac morphological analysis in the presence ​of two machine assistance techniques: pre-annotation ​and correction propagationWe  present a Bayesian analysis of the data that yields actionable insights into our data. Pre-annotation is shown to increase annotator accuracy when pre-annotations are at least 60\% accurate, and annotator speed when pre-annotations are at least 80\% accurate. Correction propagation'​s effect on accuracy is minor. | 
  
  
 +^ [http://​www.lrec-conf.org/​proceedings/​lrec2012/​pdf/​511_Paper.pdf| First Results in a Study Evaluating Pre-labeling and Correction Propagation for Machine-Assisted Syriac Morphological Analysis] ​  ​^^ ​
 +| [[media:​nlp:​120px-boxplot.png]] | Paul Felt, Eric K. Ringger, Kevin D. Seppi, Robbie Haertel, Kristian Heal, Deryle Lonsdale ​  ​| ​
 +| :::                             | '''​ LREC 2012 ''' ​                                       | 
 +| :::                             | We investigate how good machine assistance needs to be in order to actually help human annotators (in terms of time and cost) for the task of Syriac morphological disambiguation. | 
  
-[http://​aclweb.org/​anthology-new/​D/​D10/​D10-1079.pdf| A Probabilistic Morphological Analyzer for Syriac] 
-[[media:​nlp:​130px-syromorph.png|left|130px|border]] 
-* Peter McClanahan, George Busby, Robbie Haertel, Kristian Heal, Deryle Lonsdale, Kevin Seppi, Eric Ringger 
-* '''​EMNLP 2010'''​ 
-* We design a hierarchical probabilistic model to perform morphological analysis of an under-resourced Semitic language. This model achieves 86.7% accuracy, a 29.7% reduction in error rate over reasonable baselines. 
-* Extended version: [http://​contentdm.lib.byu.edu/​cdm/​singleitem/​collection/​ETD/​id/​2226/​rec/​1 Peter McClanahan'​s M.S. Thesis] 
  
 +^ [http://​aclweb.org/​anthology-new/​D/​D10/​D10-1079.pdf| A Probabilistic Morphological Analyzer for Syriac] ​  ​^^ ​
 +| [[media:​nlp:​130px-syromorph.png]] | Peter McClanahan, George Busby, Robbie Haertel, Kristian Heal, Deryle Lonsdale, Kevin Seppi, Eric Ringger ​  ​| ​
 +| :::                             | '''​ EMNLP 2010 ''' ​                                       | 
 +| :::                             | We design a hierarchical probabilistic model to perform morphological analysis of an under-resourced Semitic language. This model achieves 86.7% accuracy, a 29.7% reduction in error rate over reasonable baselines. | 
  
-[http://​contentdm.lib.byu.edu/​cdm/​singleitem/​collection/​ETD/​id/​2226/​rec/​1| A Probabilistic Morphological Analyzer for Syriac] 
-[[media:​nlp:​120px-syromorph-thesis.png|left|120px|border]] 
-* Peter McClanahan 
-* December 2010 
-* '''​Masters Thesis'''​. ​ Advised by Eric Ringger 
-* We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable baseline for Syriac. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. ​ We introduce and connect novel data-driven models for segmentation,​ dictionary linkage, and morphological tagging in a joint pipeline to create a probabilistic morphological analyzer requiring only labeled data. 
  
 +^ [http://​contentdm.lib.byu.edu/​cdm/​singleitem/​collection/​ETD/​id/​2226/​rec/​1| A Probabilistic Morphological Analyzer for Syriac] ​  ​^^ ​
 +| [[media:​nlp:​120px-syromorph-thesis.png]] | Peter McClanahan ​  ​| ​
 +| :::                             | December 2010 '''​Masters Thesis'''​. ​ Advised by Eric Ringger ​                                   | 
 +| :::                             | We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable baseline for Syriac. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. ​ We introduce and connect novel data-driven models for segmentation,​ dictionary linkage, and morphological tagging in a joint pipeline to create a probabilistic morphological analyzer requiring only labeled data. | 
  
-[http://​www.aclweb.org/​anthology/​W/​W10/​W10-0105.pdf| Parallel Active Learning: Eliminating Wait Time with Minimal Staleness] 
-[[media:​nlp:​120px-parallel-framework.png|left|120px|border]] 
-* Robbie A. Haertel, Paul Felt, Eric K. Ringger and Kevin D. Seppi 
-* '''​[http://​pages.cs.wisc.edu/​~bsettles/​active-learning/​alnlp2010/#​schedule NAACL 2010 Workshop on Active Learning for NLP]'''​ 
-* We design a parallel active learning (AL) architecture in which humans never wait for instances to be scored, and instances are selected using the most current scores possible. Experiments show that our architecture outperforms traditional batch AL in a practical setting. 
  
 +^ [http://​www.aclweb.org/​anthology/​W/​W10/​W10-0105.pdf| Parallel Active Learning: Eliminating Wait Time with Minimal Staleness] ​  ​^^ ​
 +| [[media:​nlp:​120px-parallel-framework.png]] | Robbie A. Haertel, Paul Felt, Eric K. Ringger and Kevin D. Seppi   ​| ​
 +| :::                             | '''​ [http://​pages.cs.wisc.edu/​~bsettles/​active-learning/​alnlp2010/#​schedule NAACL 2010 Workshop on Active Learning for NLP] ''' ​     | 
 +| :::                             | We design a parallel active learning (AL) architecture in which humans never wait for instances to be scored, and instances are selected using the most current scores possible. Experiments show that our architecture outperforms traditional batch AL in a practical setting. | 
  
-[http://​www.aclweb.org/​anthology-new/​N/​N10/​N10-1076.pdf| Automatic Diacritization for Low-Resource Languages Using a Hybrid Word and Consonant CMM] 
-[[media:​nlp:​130px-diacritization.png|left|130px|border]] 
-* Robbie A. Haertel, Peter McClanahan, and Eric K. Ringger 
-* '''​NAACL 2010'''​ 
-* We describe a hybrid word- and consonant-level conditional Markov model that restores Semitic diacritization with a word error rate of 10.5%, a 30% improvement over a strong baseline. ​ This result is the state of the art, to the best of our knowledge. ​ Read to the end of the paper to see the model also restore vowels in English! 
  
  
-[http://​www.lrec-conf.org/proceedings/lrec2010/summaries/360.htmlCCASH: A Web Application Framework ​for Efficient, Distributed Language ​Resource ​Development+[http://​www.aclweb.org/anthology-new/N/N10/N10-1076.pdfAutomatic Diacritization ​for Low-Resource ​Languages Using a Hybrid Word and Consonant CMM  ^^  
-[[media:​nlp:​120px-ccash.png|left|120px|border]] +[[media:​nlp:​130px-diacritization.png]] ​| Robbie A. HaertelPeter McClanahanand Eric K. Ringger ​  |  
-* Paul Felt, Owen MerklingMarc Carmen, Eric Ringger, Warren Lemmon, Kevin Seppi and Robbie Haertel +| :::                             ​| ​''' ​NAACL 2010 ''' ​                                       ​| ​ 
-'''​LREC 2010'''​ +| :::                             ​| ​We describe ​hybrid wordand consonant-level conditional Markov model that restores Semitic diacritization with a word error rate of 10.5%, a 30% improvement over a strong baseline.  ​This result ​is the state of the art, to the best of our knowledge Read to the end of the paper to see the model also restore vowels in English! | 
-We present CCASH, ​web-annotation framework implemented using the Google Web Toolkit.  ​The framework accommodates machine-learned pre-annotation and is instrumented ​to facilitate careful evaluation ​of machine-assistance and of human annotators.+
  
  
-[http://​www.lrec-conf.org/​proceedings/​lrec2010/​summaries/​451.html| Tag Dictionaries Accelerate Manual Annotation] 
-[[media:​nlp:​120px-tagdictionaries.png|left|120px|border]] 
-* Marc Carmen, Paul Felt, Robbie Haertel, Deryle Lonsdale, Peter McClanahan, Owen Merkling, Eric Ringger and Kevin Seppi 
-* '''​LREC 2010'''​ 
-* We show that even simple tag memorization can significantly increase annotation speed and accuracy. This is great news for corpora developers who don't have time to build a fancy model. 
  
 +^ [http://​www.lrec-conf.org/​proceedings/​lrec2010/​summaries/​360.html| CCASH: A Web Application Framework for Efficient, Distributed Language Resource Development] ​  ​^^ ​
 +| [[media:​nlp:​120px-ccash.png]] | Paul Felt, Owen Merkling, Marc Carmen, Eric Ringger, Warren Lemmon, Kevin Seppi and Robbie Haertel ​  ​| ​
 +| :::                             | '''​ LREC 2010 ''' ​                                       | 
 +| :::                             | We present CCASH, a web-annotation framework implemented using the Google Web Toolkit. ​ The framework accommodates machine-learned pre-annotation and is instrumented to facilitate careful evaluation of machine-assistance and of human annotators. | 
  
  
-[http://​facwiki.cs.byu.edu/​nlp/​index.php/​Workshop_on_Active_Learning_for_NLP| NAACL HLT 2009 Workshop on Active Learning for NLP] 
-* Organized by: Eric Ringger, Robbie Haertel, Katrin Tomanek 
  
 +^ [http://​www.lrec-conf.org/​proceedings/​lrec2010/​summaries/​451.html| Tag Dictionaries Accelerate Manual Annotation] ​  ​^^ ​
 +| [[media:​nlp:​120px-tagdictionaries.png]] | Marc Carmen, Paul Felt, Robbie Haertel, Deryle Lonsdale, Peter McClanahan, Owen Merkling, Eric Ringger and Kevin Seppi   ​| ​
 +| :::                             | '''​ LREC 2010 ''' ​                                       | 
 +| :::                             | We show that even simple tag memorization can significantly increase annotation speed and accuracy. This is great news for corpora developers who don't have time to build a fancy model. | 
  
-[http://​www.lrec-conf.org/​proceedings/​lrec2008/​summaries/​832.html| Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study] 
-[[media:​nlp:​120px-assessingcosts.png|left|120px|border]] 
-* Eric Ringger, Marc Carmen, Robbie Haertel, Kevin Seppi, Deryle Lonsdale, Peter McClanahan, James Carroll, Noel Ellison 
-* '''​LREC 2008'''​ 
-* We develop a realistic model of annotation cost using data collected in a controlled user study. 
  
  
 +^ [http://​facwiki.cs.byu.edu/​nlp/​index.php/​Workshop_on_Active_Learning_for_NLP| NAACL HLT 2009 Workshop on Active Learning for NLP]   ^
 +| '''​ Organized by: Eric Ringger, Robbie Haertel, Katrin Tomanek ''' ​                                       | 
  
-[http://​www.cs.iastate.edu/​~oksayakh/​csl/​accepted_papers/​haertel.pdf| Return on Investment for Active Learning] 
-[[media:​nlp:​120px-roi.png|left|120px|border]] 
-* Robbie A. Haertel, Kevin D. Seppi, Eric K. Ringger, James L. Carroll 
-* '''​NIPS 2008 Workshop on Cost-Sensitive Learning'''​ 
-* We propose return on investment (ROI) as a natural heuristic for incorporating cost into active learning, and demonstrate that it has the potential to dramatically reduce annotation cost in practice. 
  
 +^ [http://​www.lrec-conf.org/​proceedings/​lrec2008/​summaries/​832.html| Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study] ​  ​^^ ​
 +| [[media:​nlp:​120px-assessingcosts.png]] | Eric Ringger, Marc Carmen, Robbie Haertel, Kevin Seppi, Deryle Lonsdale, Peter McClanahan, James Carroll, Noel Ellison ​  ​| ​
 +| :::                             | '''​ LREC 2008 ''' ​                                       | 
 +| :::                             | We develop a realistic model of annotation cost using data collected in a controlled user study. | 
  
  
-[http://​aclweb.org/​anthology-new/​P/​P08/​P08-2017.pdf| Assessing the Costs of Sampling Methods in Active Learning for Annotation] 
-[[media:​nlp:​120px-alcosts.png|left|120px|border]] 
-* Robbie Haertel, Eric Ringger, Kevin Seppi, James Carroll, Peter McClanahan 
-* '''​ACL 2008'''​ 
-* We show that in many practical settings like sequence tagging, correctly comparing AL algorithms requires modeling annotation costs. 
  
 +^ [http://​www.cs.iastate.edu/​~oksayakh/​csl/​accepted_papers/​haertel.pdf| Return on Investment for Active Learning] ​  ​^^ ​
 +| [[media:​nlp:​120px-roi.png]] | Robbie A. Haertel, Kevin D. Seppi, Eric K. Ringger, James L. Carroll ​  ​| ​
 +| :::                             | '''​ NIPS 2008 Workshop on Cost-Sensitive Learning ''' ​                                       | 
 +| :::                             | We propose return on investment (ROI) as a natural heuristic for incorporating cost into active learning, and demonstrate that it has the potential to dramatically reduce annotation cost in practice. | 
  
  
  
  
-[http://​aclweb.org/​anthology-new/​W/W07/W07-1516.pdf| Active Learning for Part-of-Speech Tagging: Accelerating Corpus ​Annotation] +[http://​aclweb.org/​anthology-new/​P/P08/P08-2017.pdf| Assessing the Costs of Sampling Methods in Active Learning for Annotation] ​  ^^  
-[[media:​nlp:​accelerating.png|left|120px|border]] +[[media:​nlp:​120px-alcosts.png]] ​Robbie Haertel, ​Eric RinggerKevin Seppi, James Carroll, ​Peter McClanahan ​  ​| ​ 
-* Eric Ringger, Peter McClanahan, ​Robbie Haertel, ​George BusbyMarc Carmen, James Carroll, ​Kevin Seppi, Deryle Lonsdale +| :::                             ​| ​'''​ ACL 2008 ''' ​                                       ​| ​ 
-'''​ACL ​2007 Linguistic Annotation Workshop (LAW)'''​ +| :::                             ​| ​We show that in many practical settings like sequence tagging, correctly comparing ​AL algorithms requires modeling annotation costs
-We use active learning (AL) to decide which portions of an automatically annotated corpus should be manually corrected. We experiment with various AL criteria and demonstrate improved final corpus quality on both prose and poetry.+
  
  
  
-[http://citeseerx.ist.psu.edu/viewdoc/summary?​doi=10.1.1.158.1648Modeling the Annotation Process ​for Ancient ​Corpus ​Creation+[http://aclweb.org/anthology-new/W/​W07/​W07-1516.pdfActive Learning ​for Part-of-Speech Tagging: Accelerating ​Corpus ​Annotation  ^^  
-[[media:​nlp:​120px-ancientannotationprocess.png|left|120px|border]] +[[media:​nlp:​accelerating.png]] ​| Eric Ringger, Peter McClanahan, Robbie Haertel, ​George BusbyMarc Carmen, James Carroll, Kevin Seppi, Deryle Lonsdale ​  ​| ​ 
-* James L. Carroll, Robbie Haertel, ​Peter McClanahanEric Ringger, Kevin Seppi +| :::                             ​| ​''' ​ACL 2007 Linguistic Annotation Workshop (LAW) ''' ​                                       ​| ​ 
-'''​ECAL 2007'''​ +| :::                             ​| ​We use active learning ​(AL) to decide which portions of an automatically annotated corpus should be manually correctedWe experiment with various AL criteria and demonstrate improved final corpus quality on both prose and poetry. | 
-We introduce a decision-theoretic model of the annotation process that captures complex interactions among the machine learner, the active learning ​technique, the annotation cost, human annotation accuracy, the annotator user interface, etc.+
  
  
  
-<​!-- ​ 
-==Want to help?== 
  
-We are currently annotating a corpus of English news articles You can help by taking the time to annotate a set of sentences You will be presented with one sentence at a time and will be asked to either annotate a single word or the entire sentence We are ready for user help We expect ​that the average participant will spend less than an hour on the task Thank you in advance for participating!+^ [http://​citeseerx.ist.psu.edu/​viewdoc/​summary?​doi=10.1.1.158.1648| Modeling the Annotation Process ​for Ancient Corpus Creation] ​  ^^  
 +| [[media:​nlp:​120px-ancientannotationprocess.png]] | James LCarroll, Robbie Haertel, Peter McClanahan, Eric Ringger, Kevin Seppi   |  
 +| :::                             | '''​ ECAL 2007 ''' ​                                       |  
 +| :::                             ​| ​We introduce a decision-theoretic model of the annotation process ​that captures complex interactions among the machine learner, ​the active learning technique, the annotation cost, human annotation accuracy, the annotator user interface, etc
  
-<​b><​big>​[http://​nlp.cs.byu.edu/​alfaUserStudy/​ Begin the study now! ]</​big></​b>​ 
  
-==Get Updates== 
  
-For updates on the status of the study and results from the study, please [http://​groups.google.com/​group/​alfa-userstudy/​subscribe subscribe to the Google Group]. 
---> 
  
 ==Questions?​== ==Questions?​==
  
 Please contact [http://​faculty.cs.byu.edu/​~ringger/​ Eric Ringger] or [http://​faculty.cs.byu.edu/​~kseppi/​ Kevin Seppi], or visit the Natural Language Processing research lab in room 3346 TMCB. Please contact [http://​faculty.cs.byu.edu/​~ringger/​ Eric Ringger] or [http://​faculty.cs.byu.edu/​~kseppi/​ Kevin Seppi], or visit the Natural Language Processing research lab in room 3346 TMCB.
nlp/machine-assisted-annotation.1432244930.txt.gz · Last modified: 2015/05/21 21:48 by plf1
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0