**This is an old revision of the document!**


Below are some emails discussing the per-template feature weight analysis


From: Robbie Haertel<br/> Sent: Friday, November 07, 2008 8:01 AM<br/> To: Eric Ringger; Peter McClanahan<br/> Subject: Char model template ranking<br/>

I wrote a script to rank templates by the sum of the absolute value or square of the features they produce. Here are the results for the character model (using square of weights):

UVC-2-1-0 13667.672520080343<br/> UVC-3-2-1 13660.474430102808<br/> UVC+0+1+2 12399.267224576308<br/> UVC+1+2+3 9606.828816105764<br/> UVC-1-0 2559.46629499261<br/> UVC+0+1 2114.3135426928425<br/> UVC-2-1 2104.486720839376<br/> UVC+1+2 1475.8852669479863<br/> PREV_VOWELS 1400.7812718725113<br/> …

I believe Peter's note UVC means unvoweled consonant and the plus and minus mean includes characters two to the left (-2) right here (-0) etc.

EOS means EOS (which should be EOW for word)

The PREV_VOWEL_* features have the same weight which helped me find a bug–retraining right now.

Fact: the model is only 5 Mb so we can add plenty of new features

Suggestion: it looks like we might get some mileage out of adding larger n-grams (the ones centered at zero seem to work slightly better).

Robbie


Currently the FEC allows you to view per-feature weights. What Robbie has implemented is a way to assess per-template weights. Very critical for template-level feature-engineering.

Josh, would you put this on the to-do list – for you or someone else to take up when ready.

Thanks, –Eric


From: Robbie Haertel<br/> Sent: Saturday, November 08, 2008 9:16 PM<br/> To: Eric Ringger<br/> Cc: Peter McClanahan<br/>

Subject: Re: Char model template ranking

Here are the most up-to-date results. I have there methods for ranking templates: sum, avg, and max, which correspond to the sum of all features produced by a template, the avg. and the max. The problem with sum is that templates that produce a lot of features are unfairly represented; average has the problem that there could be one really good hidden feature; max leaves out the cumulative effect of lower weighted features. For the word model, I've summed across all models, but in order to put them on the same playing field, I've normalized each model by dividing the weights by the difference of the maximum and minimum weights for the model.

Of course, the features are not necessarily self-explanatory.

Character:

Sum<br/> UVC+0+1+2 56.90642282021061<br/> PARTIALLY_DIACRITICIZED_N-GRAM_4 48.21648326339715<br/> UVC+1+2+3 46.71085783636655<br/> UVC-2-1-0 41.28993978036493<br/> UVC-3-2-1 28.411102871841752<br/> PARTIALLY_DIACRITICIZED_N-GRAM_3 13.139955676041424<br/> UVC-1-0 9.059466014618154<br/> UVC+0+1 8.834905814140908<br/> UVC+1+2 5.10312275907008<br/> PREV_3DIACRITICS 4.651042786597986<br/> UVC-2-1 3.729255127717895<br/> PARTIALLY_DIACRITICIZED_N-GRAM_2 2.656147659768024<br/> …

By average<br/> PREV_DIACRITIC_1 0.003508388902034368<br/> UVC+1 0.0033205114630904674<br/> PREV_DIACRITIC_3 0.002422712014570924<br/> UVC-4 0.0023483482910579027<br/> UVC 0.0018694564189856001<br/> EOW_0 0.0018616896244633382<br/> PREV_DIACRITIC_2 0.0018021069851990807<br/> UVC-2 0.0015103571822412162<br/> EOW_1 0.0014282984886792042<br/> …

By max<br/> PREV_DIACRITIC_1 0.003508388902034368<br/> UVC+1 0.0033205114630904674<br/> PREV_DIACRITIC_3 0.002422712014570924<br/> UVC-4 0.0023483482910579027<br/> UVC 0.0018694564189856001<br/> EOW_0 0.0018616896244633382<br/> PREV_DIACRITIC_2 0.0018021069851990807<br/> UVC-2 0.0015103571822412162<br/> …

Word:

Sum<br/> PREV_VOWELED_SUFFIX_1 5234.069297836419<br/> PREV_VOWELED_PREFIX_1 4604.480946580072<br/> PREV_VOWELED_SUFFIX_2 4286.125609120276<br/> PREV_VOWELED_PREFIX_2 4035.467960754094<br/> PREV_VOWELING_1 1731.6868118682146<br/> PREV1 1715.622510182117<br/> BEFORESUFFIX_1_1 1701.4053417460768<br/> BEFOREPREFIX_1_1 1687.5277231724885<br/> SUFFIX_AGREEMENT_PATTERN_1 1622.8327423461471<br/> PREFIX_AGREEMENT_PATTERN2_1 1589.946248931253<br/> …

By average SUFFIX_1 0.10446400188291545<br/> PREFIX_1 0.10446400188291545<br/> SUFFIX_3 0.10314193165181945<br/> PREFIX_3 0.10314193165181945<br/> SUFFIX_2 0.10290156189167299<br/> PREFIX_2 0.10290156189167299<br/> PREFIX_AGREEMENT_PATTERN2_0 0.03840109781621068<br/> SUFFIX_AGREEMENT_PATTERN_0 0.026196459437768865<br/> PREFIX_AGREEMENT_PATTERN2_1 0.021202393003390538<br/> BEFOREPREFIX_1_1 0.019702370353790247<br/> AFTERPREFIX_1_1 0.019369258361621268<br/> BEFORESUFFIX_1_1 0.018596016544938705<br/> AFTERSUFFIX_1_1 0.017126024961369425<br/> …

By max<br/> SUFFIX_1 0.10446400188291545<br/> PREFIX_1 0.10446400188291545<br/> SUFFIX_3 0.10314193165181945<br/> PREFIX_3 0.10314193165181945<br/> SUFFIX_2 0.10290156189167299<br/> PREFIX_2 0.10290156189167299<br/> PREFIX_AGREEMENT_PATTERN2_0 0.03840109781621068<br/> SUFFIX_AGREEMENT_PATTERN_0 0.026196459437768865<br/> PREFIX_AGREEMENT_PATTERN2_1 0.021202393003390538<br/> BEFOREPREFIX_1_1 0.019702370353790247<br/> AFTERPREFIX_1_1 0.019369258361621268<br/> BEFORESUFFIX_1_1 0.018596016544938705<br/> AFTERSUFFIX_1_1 0.017126024961369425<br/> …

Robbie

nlp-private/per-template-feature-weight-viewer.1429736683.txt.gz · Last modified: 2015/04/22 15:04 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0