Project Goals

  • Have the FEC be used by another team for feature engineering work.
  • Built-in Scripting System: integrate dynamic scripting for defining features at runtime: more usable, quicker feature definition experience.
  • Public Source Code Release?

Brainstorming

These are some paths the project could take:

  • Use Jaxe to create a rich .def.xml file editor?

Outstanding Issues

  • Generalization:
    • <s>Formalize the distinction between identification systems (multiple decisions possible; one classifier run per each truth-hypoth combination) and classification systems (one decision possible; one classifier run for a given truth and all possible hypothesis, making a decision between the hypotheses)</s> - DONE by means of Edu.byu.nlp.experimentation.
  • <s>Thread safety: why can't we run the pnp experiment within FEC?</s> – I believe the issue is resolved. I had added too many “synchronized” blocks, causing deadlocks.
  • <s>Accuracy of DET curves: why don't they look _exactly_ like the gnuplot-produced ones?</s> - DONE in r360 The points are all plotted correctly now. Only remaining issue is the tick marks, which while accurate are not the traditional 0.5, 1, 5, 10, 20, etc.
  • Refactoring: dumb viewer that delegates to a smart controller
  • <s>ArrayList supersedes Vector</s> This has essentially been accomplished.
  • Make documentation more thorough

Notable SVN Commits

Initial development of the Feature Engineering Console occurred within a branch in the Spoken Language ID SVN repository. Eventually the feature-eng-console branch was variously merged back into NIST/HEAD or moved into the newly-created FEC repository, which was cloned from the NIST repository at r371. The history of the feature-eng-console branch is as follows:

Robbie's Ideas

Josh,

I thought of some new features, some of them fairly necessary (who knows how we overlooked them).

1) We NEED to be able to see a list of files of misses and false alarms (and perhaps hits). I propose that when you click on a language pair, the next screen brings up a list of all files involving the pair as either the truth or the hypothesis, separated by misses and false alarms and hits. There's probably info next to the file, like its duration and possibly the scores assigned by the two languages and their thresholds. There should probably also be an audio button so that you can hear it. If you click on the file itself, you are brought to another screen which shows the scores for each of languages for that file (NOT just the pair). You could also calculate the entropy of this distribution and/or other metrics to help diagnose how confused the model was. You will also need to include the threshold used by each language (and possibly the “single” threshold as well).

2) History. The “1st” plot–the baseline vs. this current test–should show all previous tests (although, the user should have the option of deleting and/or not adding certain “garbage” runs). This probably holds true of all DET curves. Similarly, we should have a mechanism for tracking the history of tabular data. This will be a lot harder to visualize, but at the very least, you could “scroll” through the history of tables. You could also plot the data in one cell over time. This is particularly useful for the “overall” cost, eer, and min cost. It probably ought to be possible to add or remove “lines” at any time from the histories.

Based on my previous message, it should be possible to keep this data separately for the n-language tests and any given *chosen* 1 v 1 tests. Suppose for instance, I decide based on the n-language test to work on Mandarin v. English. I run the 1 v 1 test using the current features as the baseline (may also be useful to include the original baseline). Each time I add new features, the DET curve is added to this plot (tabular data similarly saved). By the time I'm done, I'll have several plots on my 1 v 1 history. Then I'll use the feature set I ended on (actually, choose based on results) and then re-run the n-language test. At that point I should have 3 lines for the n-language data: baseline, 1st test, and 2nd test based on 1 v 1 subtests. Suppose I continue the process with several other language pairs. Each pair should have their own plot. If I ever come back to a language pair, that data is appended to the accruing history.

3) One option when running tests would be to make a single language choice. This should not change any of the internals of the GUI, but sooner or later we can add this functionality to the pipeline. Maybe one thing it would affect in the GUI is that now another metric is possible: classification accuracy with a classification confusion matrix. This is not possible with our current type of model.

4) Another option would be to allow the user to choose the type of model; this actually may be the same thing as 3. For instance, maybe we are using 1 v 1 MaxEnt models or 1 v rest SVMs (initial choices would be MaxEnt vs. SVMs and 1 v 1, 1 v rest, or multiclass). As far as I can tell, this shouldn't affect the GUI as explained in 3.

5) The ability to choose a predefined dataset, use a randomized (probably stratified) subset of an existing dataset (to minimize the amount of time required to run), and the ability to create a new dataset (given the desired size and percentage of training vs. evaluation data). This should probably happen in the “Start new test” wizard, as should #3 and #4.

6) We probably want to make the system as “pluggable” as possible, at least where metrics are concerned so that we can add new ones with very little difficulty.

7) I think we should add provisions to the XML files either (1) allow JAVA to “insert” features (that aren't saved permanently, just in memory or through temp files) or preferably, (2) have an “include” system that allows one feature file to be a complete superset of one or more other files by including them. This saves from a whole bunch of copy-and-paste

8) With the addition of history, we will probably also need controls that allow us to efficiently view it. For instance, I will probably want to see a graph of time (iteration number, which is just the XML filename) vs. cost. A table might be able to show relative increase/decrease in cost, eer, etc. b/t two histories. So on.

9) Scriptable generation of graphs. This *might* be another program based on the same Java objects. The idea is that if we want to publish something (or students would like to write a report), they can easily dictate which XML files should be included and what graphs/metrics to “export”. This could easily just be a selection box in a GUI.

10) Graphs should be savable in vector graphics formats (i.e. “exportable”).

11) Customizable graphing behavior (color vs. dashed, dotted lines; change in scale, etc.)

What do you think about these ideas?

Robbie Feature Engineering Console

nlp-private/fec-development.txt · Last modified: 2015/04/22 15:04 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0