When a token doesn't appear in the dictionary, explore splitting the token with a space to see if the split tokens appear. Example “artillaryfire” → “artillery fire”. Note that Aspell does this, but it isn't clear what to do with the second half of the token.
When a token has a dash, split the token at the dash and see if the pieces appear in the dictionary. Example “anti-tank”.
When a dash appears at the end of a word, explore merging the words without the dash.
When a token ends with a dash, remove the dash and see if the token is in the dictionary, also remove the dash and merge with the next word to see if it is in the dictionary.
Separate punctuation from tokens. Currently Sclite requires punctuation to appear correctly in the hypothesis file. By separating punctuation from tokens in both the reference and hypotheses files we are recognizing tokens and punctuation that may be correct when the other may be incorrect.
What do we do with “work's”?
Make sure that we only consider punctuation in an appropriate spot, e.g. wo?rk would not be addressed by this.
In the mean time, make sure that the dictionary search drops punctuation since it will never appear
Ultimately punctuation needs to be restored in the text that is saved for the patron.
In the commit class, consider “levels of evidence”
Train on weights for voting of alternatives
Expand Spell Checking using Aspell
When Aspell splits a token, and the same sequence of the next token column is empty, add it. Dr. Ringger considers this to be arbitrary, not principled