Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp-private:feature-definition-xml-file-roadmap [2015/04/22 21:19] (current)
ryancha created
Line 1: Line 1:
 +Resulting from a discussion between Josh and Dr. Ringger on 18 December 2007.
 +
 +Objective: to modularize the definition file mechanism (reducing redundancy/​improving reuse) and to increase readability/​usability of the definition format.
 +
 +Steps
 +# <​s>​Create a regression test from an experiment using a complex .def.xml file</​s><​br/>​Create a regression test for every experiment -- DONE in [http://​nlp.cs.byu.edu/​trac/​NIST/​changeset/​394 r394]
 +# In [http://​nlp.cs.byu.edu/​trac/​NIST/​browser/​HEAD/​Language-ID/​src/​edu/​byu/​langid/​features/​FeatureDefinitionFileParser.java FeatureDefinitionFileParser] migrate from the custom XML parser built by Nathan ([http://​nlp.cs.byu.edu/​trac/​NIST/​browser/​HEAD/​Language-ID/​src/​edu/​byu/​langid/​domparser/​SimpleDOMParser.java SimpleDOMParser]) to the standard (Xerces) parser provided by Java.
 +#* '''​Motivation:'''​ Allow inclusions at the XML level using [http://​www.ibm.com/​developerworks/​xml/​library/​x-tipgentity.html#​N100CE external entity].
 +#* '''​Implication:'''​ We have to replace the <, >, and & symbols with the standard XML syntax:
 +#​*:<​code><</​code>​ becomes <​code>&​amp;​lt;</​code>​
 +#​*:<​code>></​code>​ becomes <​code>&​amp;​gt;</​code>​
 +#​*:<​code>></​code>​ becomes <​code>&​amp;​amp;</​code>​
 +# Check for regressions.
 +# Flatten the XML files using the new inclusions mechanism -- DONE in [http://​nlp.cs.byu.edu/​trac/​NIST/​changeset/​399 r399]
 +#​*Quantizations are now defined in xml files in Language-ID/​config/​quantizations/​
 +#*Feature templates are now defined in xml files in Language-ID/​config/​features/​
 +#*The .def.xml files in Language-ID/​config/​feature_sets/​ now use XML external entity inclusions to refer to the quantization and feature template files.
 +#* def.xml syntax now requires the type of feature to be specified. <​code><​feature></​code>​ tags are no longer valid and must be replaced by <​code><​file_feature>,​ <​slice_feature>,</​code>​ or <​code><​count_feature></​code>​ tags.
 +#** '''​NEW'''​ Based on this new syntax, we could remove the requirement for <​code><​file_features>,​ <​slice_features>,​ <​count_features>,</​code>​ and <​code><​quantizations></​code>​ groups.
 +# Check for regressions.
 +#  '''​NEW'''​ Allow for one .def.xml file to "​extend"​ another in an object-oriented inheritance sense. Thus when a lot of common code is shared between two defs, the simpler one could be extended by the more complex one, and so on. Implement this as an option in code, and then restructure the defs accordingly.
 +# Check for regressions.
 +# StatNLP Integration:​
 +## Move the feature definition mechanism into StatNLP.
 +## Integrate this mechanism into the PNP experiment.
 +## Apply StatNLP'​s ExperimentHarness system to SpokenLID experiments.
 +# Check for regressions.
 +# Develop a [http://​en.wikipedia.org/​wiki/​Domain-specific_programming_language Domain Specific Language] to describe the feature definitions.
 +## Define relevant data structures in Java (FeatureSet?,​ FeatureDefinition?​) or ensure that they already exist (edu.byu.langid.features.Quantization and edu.byu.langid.features.Quantization.Quantile)
 +## Use a scripting language supported by Java's script engines framework to implement the DSL in terms of the data structures.
 +## Switch from static .def.xml files to DSL-based definitions. For now the DSL scripts will generate corresponding XML files until we retool the batch feature extractor (FeatureFileBatchConverter) to use DSL-based definitions directly.
 +#* [http://​builder.rubyforge.org/​ Ruby easy XML output library]
 +#* [http://​onestepback.org/​articles/​lingo/​index.html A presentation on Ruby for DSL implementation]
 +# Check for regressions.
 +# GUI Integration in [[Feature Engineering Console]]
 +[[Category:​Spoken Language ID]]
  
nlp-private/feature-definition-xml-file-roadmap.txt ยท Last modified: 2015/04/22 21:19 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0