nlp-private:cost-models-from-the-user-study-data [CS Wiki]

Trace: • lab-1-supplement • cost-models-from-the-user-study-data

Table of Contents

Dependent Variable
Sentence at a Time
Word at a Time
- Batch Oracular Model
- Descriptive Oracular Model

Dependent Variable

Time: the time in seconds that the subject spent on the current case.

Sentence at a Time

Batch Oracular Model

Length
Number Needing Correction
Conditional Entropy
Accuracy on Test Set

Descriptive Oracular Model

Length: The number of tokens in the sentence. When annotating a single word it is the length of the sentence in which the word appears.

Subject Accuracy: The percentage of tokens correctly tagged by the subject. When annotating a single word this is either 0% or 100%

Location: Index of the current case in the session

Tagger Accuracy: The percentage of words correctly tagged by the automatic tagger in the sentence. When annotating a single word this is either 0% or 100%

Number Needing Correction: the number of words in the case needing correction

Percent Done: percentage of the cases assigned to the current subject already encountered

Conditional Entropy:
- For whole sentence annotation, an estimate of the total tag sequence entropy given the words in the current sentence.
- For single word annotation, the entropy of the tag distribution for the current word.
- Probably useless because sentences were selected based on high entropy.

From Tagger: The accuracy of the tagger providing the candidate tags on the test set

Native English Speaker: a 0/1 indicator of whether the subject is a native English speaker

Previously Participated in Study: a 0/1 indicator of whether the subject was part of a previous (similar) tagging exercise

Self Evaluation Tagging Proficiency: a 0/1/2 indicator of the subject self-evaluation of tagging proficiency.

Self Evaluation of Performance in Study: a 0/1/2 indicator of the subject self-evaluation of tagging accuracy in this study.

Annotation-Time Model

Could have running average of time on previous cases, normalized by length

Length: The number of tokens in the sentence. When annotating a single word it is the length of the sentence in which the word appears.

Location: Index of the current case in the session

Tagger Accuracy: The percentage of words correctly tagged by the automatic tagger in the sentence. When annotating a single word this is either 0% or 100%
- Approximated by running average

Number Needing Correction: the number of words in the case needing correction
- Approximated by (1 - accuracy) * length

Percent Done: percentage of the cases assigned to the current subject already encountered

Conditional Entropy:
- For whole sentence annotation, an estimate of the total tag sequence entropy given the words in the current sentence.
- For single word annotation, the entropy of the tag distribution for the current word.
- Probably useless because sentences were selected based on high entropy.

Native English Speaker: a 0/1 indicator of whether the subject is a native English speaker

Previously Participated in Study: a 0/1 indicator of whether the subject was part of a previous (similar) tagging exercise

Self Evaluation Tagging Proficiency: a 0/1/2 indicator of the subject self-evaluation of tagging proficiency.

Word at a Time

Batch Oracular Model

Descriptive Oracular Model

=== Annotation-Time Model

nlp-private/cost-models-from-the-user-study-data.txt · Last modified: 2015/04/23 14:39 by ryancha

Back to top

CC Attribution-Share Alike 4.0 International