##### Differences

This shows you the differences between two versions of the page.

 nlp-private:experiments [2015/04/23 15:05]ryancha nlp-private:experiments [2015/04/23 15:23] (current)ryancha Both sides previous revision Previous revision 2015/04/23 15:23 ryancha 2015/04/23 15:05 ryancha 2015/04/23 15:03 ryancha 2015/04/23 14:55 ryancha created 2015/04/23 15:23 ryancha 2015/04/23 15:05 ryancha 2015/04/23 15:03 ryancha 2015/04/23 14:55 ryancha created Line 379: Line 379: As seen below, there is not a significant change. It is interesting to note that the fixed version actually does better for part of the curve. I haven'​t thought about why this is much, but there should be a logical explanation. As seen below, there is not a significant change. It is interesting to note that the fixed version actually does better for part of the curve. I haven'​t thought about why this is much, but there should be a logical explanation. - [[media:​nlp:​QBUBugFix.png|thumbnail|none]] + [[media:​nlp:​QBUBugFix.png]] - + - [[media:​nlp:​QBUBugFix.xlsx]] + Here's also a quick version of time comparisons. Here's also a quick version of time comparisons. - [[media:​nlp:​QBUBugFixTime.png|thumbnail|none]] + [[media:​nlp:​QBUBugFixTime.png]] == QBUE == == QBUE == Line 406: Line 404: First notice, especially from the 4th graph, that QBU and QBUE are very similar. ​ Further experiments will follow to prove the difference is statistically insignificant. First notice, especially from the 4th graph, that QBU and QBUE are very similar. ​ Further experiments will follow to prove the difference is statistically insignificant. + [{{media:​nlp:​overall_accuray.PNG|0%-100%}}] + [{{media:​nlp:​overall_accuray90-96.PNG|90%-96%}}] - [[media:​nlp:​overall accuray.PNG|0%-100%]] - [[media:​nlp:​overall accuray90-96.PNG|90%-96%]] + [{{media:​nlp:​difference_from_baseline_y.PNG|Y-Axis}}] - [[media:nlp:difference from baseline y.PNG|Y-Axis]] + [{{media:nlp:difference_from_baseline.PNG|X-Axis}}] - [[media:​nlp:​difference from baseline.PNG|X-Axis]] + Difference from QBU: Y-Axis: sentences trained on X-Axis: Difference in Accuracy from the QBU's Accuracy Difference from QBU: Y-Axis: sentences trained on X-Axis: Difference in Accuracy from the QBU's Accuracy - [[media:​nlp:​difference from qbu.PNG|thumbnail|none]] + + [[media:​nlp:​difference_from_qbu.PNG|thumbnail|none]] Difference from QBUV: Y-Axis: sentences trained on X-Axis: Difference in Accuracy from the QBUV's Accuracy Difference from QBUV: Y-Axis: sentences trained on X-Axis: Difference in Accuracy from the QBUV's Accuracy - [[media:​nlp:​difference from qbuv.PNG|thumbnail|none]] + + [[media:​nlp:​difference_from_qbuv.PNG|thumbnail|none]] == QBU Derivative == == QBU Derivative == Line 430: Line 431: === Results === === Results === Here is a graph showing the derivatives and the 10-period moving average of the lines in order to remove noise. Here is a graph showing the derivatives and the 10-period moving average of the lines in order to remove noise. - [[media:​nlp:​Derivative.png|thumbnail|none]] + + [[media:​nlp:​Derivative.png|thumbnail|none]] ​No image to be found! == Switch-over Point == == Switch-over Point == Line 470: Line 472: Also note that for both the baseline and QBU the slopes of the respective curves are approximately equally at around 125 sentences (this is by eyeball). This possibly indicates that we are okay to apply count cutoffs around this point, though the exact point is probably dataset dependent (including total amount of data, highest possible accuracy, etc.). Also note that for both the baseline and QBU the slopes of the respective curves are approximately equally at around 125 sentences (this is by eyeball). This possibly indicates that we are okay to apply count cutoffs around this point, though the exact point is probably dataset dependent (including total amount of data, highest possible accuracy, etc.). - <​center>​ + [{{media:​nlp:​CutoffExperiment.png|Accuracy per Iteration}}] - ​ + [{{media:​nlp:​CutoffExperimentBenefit.png|Advantage}}] - media:​nlp:​CutoffExperiment.png|Accuracy per Iteration + - media:​nlp:​CutoffExperimentBenefit.png|Advantage + - ​ + - ​ + On the first graph, the y-axis is accuracy, and the x-axis is the number of iterations (also the number of sentences trained on). On the second graphs the axes are reversed (hence the graph is mirrored about the y=x line). This simplifies visual estimation of the benefit over baseline. On the first graph, the y-axis is accuracy, and the x-axis is the number of iterations (also the number of sentences trained on). On the second graphs the axes are reversed (hence the graph is mirrored about the y=x line). This simplifies visual estimation of the benefit over baseline. Line 488: Line 487: We next ran QBU for three iterations after which we switched to the random baseline (code not versioned); the process was repeated for a switchover point of 5: We next ran QBU for three iterations after which we switched to the random baseline (code not versioned); the process was repeated for a switchover point of 5: - - I wish to further note that while it is possible to compute the derivative of the QBU curve in a real situation, the derivative of the random curve will not usually be available, hence this could not be used as a stopping criterion in a real-world task. I wish to further note that while it is possible to compute the derivative of the QBU curve in a real situation, the derivative of the random curve will not usually be available, hence this could not be used as a stopping criterion in a real-world task. Line 568: Line 555: === Data === === Data ===

- [[media:​nlp:​WordsAndTags.png|thumbnail|none|Number of words associated with each tag]] + [[media:​nlp:​WordsAndTags.png|Number of words associated with each tag]] - [[media:​nlp:​TagsAndWords.png|thumbnail|none|Number of tags associated with each word]] + [[media:​nlp:​TagsAndWords.png|Number of tags associated with each word]] Excel 2007 Data Excel 2007 Data Line 592: Line 579: === Results === === Results === Surprisingly,​ our POS Tagger did much better on the full PTB set than I would have guessed. Averaged final values were: 96.6858 (QBUV), 96.6871 (QBU), 96.6865 (LS), and 96.6830 (Baseline) with an average over all 20 runs equaling 96.6856. Our previous high using the first 25% of the PTB ended around 95.7 percent. ​ Surprisingly,​ our POS Tagger did much better on the full PTB set than I would have guessed. Averaged final values were: 96.6858 (QBUV), 96.6871 (QBU), 96.6865 (LS), and 96.6830 (Baseline) with an average over all 20 runs equaling 96.6856. Our previous high using the first 25% of the PTB ended around 95.7 percent. ​ - <​center><​gallery caption="​Comparing 100 percent and 25 percent">​ + [{{media:​nlp:​25_percent.png|Figure 1}}] - media:​nlp:​25_percent.png|Figure 1 + [{{media:​nlp:​100_percent.png|Figure 2}}] - media:​nlp:​100_percent.png|Figure 2 + [{{media:​nlp:​100-25-compare-all.png|Figure 3}}] - media:​nlp:​100-25-compare-all.png|Figure 3 + - ​ + '''​Figure 1'''​ shows the Baseline, LS, QBU, and QBUV on the first 25 percent of the PTB. It is worth noticing that there is no distinguishable difference (at this resolution) between LS, QBU, and QBUV. All three, however, are superior to the random baseline. '''​Figure 1'''​ shows the Baseline, LS, QBU, and QBUV on the first 25 percent of the PTB. It is worth noticing that there is no distinguishable difference (at this resolution) between LS, QBU, and QBUV. All three, however, are superior to the random baseline. Line 604: Line 589: '''​Figure 3'''​ shows three major groups: 1) the baselines, 2) algorithms at 25% of the PTB and 3) algorithms at 100% of the PTB. From this graph it is easy to see the advantage of having more data -- the accuracies grow quicker. '''​Figure 3'''​ shows three major groups: 1) the baselines, 2) algorithms at 25% of the PTB and 3) algorithms at 100% of the PTB. From this graph it is easy to see the advantage of having more data -- the accuracies grow quicker. ​ ​ - <​center><​gallery caption="​Comparing advantages of 100 percent and 25 percent">​ + [{{media:​nlp:​100-25-qbu-comparison.png|Figure 4}}] - media:​nlp:​100-25-qbu-comparison.png|Figure 4 + [{{media:​nlp:​100-25-qbuv-comparison.png|Figure 5}}] - media:​nlp:​100-25-qbuv-comparison.png|Figure 5 + [{{media:​nlp:​100-25-ls-comparison.png|Figure 6}}] - media:​nlp:​100-25-ls-comparison.png|Figure 6 + [{{media:​nlp:​100-25-qbu-wc-comparison.png|Figure 7}}] - media:​nlp:​100-25-qbu-wc-comparison.png|Figure 7 + [{{media:​nlp:​100-25-qbuv-wc-comparison.png|Figure 8}}] - media:​nlp:​100-25-qbuv-wc-comparison.png|Figure 8 + [{{media:​nlp:​100-25-ls-wc-comparison.png|Figure 9}}] - media:​nlp:​100-25-ls-wc-comparison.png|Figure 9 + - ​ + '''​Figures 3-5'''​ show that with more data, QBU, QBUV, and LS all tend to pick longer sentences, getting more 'bang for their buck.' Thus, 100% of the PTB has a distinct advantage, because there are more long sentences. ​ '''​Figures 3-5'''​ show that with more data, QBU, QBUV, and LS all tend to pick longer sentences, getting more 'bang for their buck.' Thus, 100% of the PTB has a distinct advantage, because there are more long sentences. ​ Line 659: Line 642: === Table === === Table === - ​ + <​html>​​
<​th><​font face="​courier"​ size="​3">​Subtag​ <​th><​font face="​courier"​ size="​3">​Subtag​ Line 884: Line 867: <​td><​font face="​courier"​ size="​3">​0.9339​ <​td><​font face="​courier"​ size="​3">​0.9339​ - ​ + == Fast Maxent == == Fast Maxent ==