The Spoken Language ID project seeks to….

Introductory Tasks

Running An Experiment in Ten Shell Commands Or Less

  1. Ensure that your system meets the current system requirements. On an Ubuntu Linux system, this can be done using a command such as this:<br/>
    sudo apt-get install sun-java6-jdk ant perl ruby gnuplot gcc cmake make pdl subversion
  2. Check out the current Language ID repository:<br/>
    svn co <nowiki>http://nlp.cs.byu.edu/subversion/NIST/HEAD</nowiki>
  3. Enter the HEAD directory created by the previous command:<br/>
    cd HEAD
  4. Be sure that the template from which the main configuration file will be generated represents your setup by editing
    Language-ID/config/language-id.conf.cmake

    . Do this using your favorite text editor, for example:

    vim Language-ID/config/language-id.conf.cmake

    . Pay particular attention to

    PRAAT_EXE

    ,

    PERL_EXE

    ,

    WAV_DATA_ORIGINAL_LOCATION

    , and

    SEG_DATA_ORIGINAL_LOCATION

    (or their

    *_WIN

    counterparts if running on Windows) to be sure that these contain the correct paths. (

    LABTOOLS

    seems to be unused at the moment.)

  5. Generate the build system with default settings using cmake:<br/>
    cmake .

    <br/>This also generates the configuration file

    Language-ID/config/language-id.conf

    based on the settings you provided in

    Language-ID/config/language-id.conf.cmake

    .

  6. Build DETware:<br/>
    make
  7. Build Language-ID:<br/>
    cd Language-ID<br/>ant
  8. Run the experiment:<br/>
    cd ..<br/>make detcurve

    <br/>The

    detcurve

    target will create the

    experiments

    directory in which all experiment data will be stored. It will then copy seg files and wav files into

    experiments/data

    and begin extracting features from these data files by running the seg2xml3.pl script.<br/>Once these preliminary data have been copied and analyzed, a directory specific to the current experiment will be created. As the default experiment is

    fourgramall

    , a directory called

    experiments/fourgramall

    will be created to house results specific to that experiment. ling files will be generated and placed in

    experiments/fourgramall/data

    . Language models will be trained and placed in

    experiments/fourgramall/models

    . Other results, including metrics and plot data, will be placed in

    experiments/fougramall/results
  9. Explore the DET curves, avgcost.txt, avgeer.txt, etc. in the
    results

    directory.

Running a Specific Experiment

The above example only runs the default 'fourgramall' experiment. Running a specific experiment requires almost exactly the same process. For example, to run the 'fivegram' experiment, substitute the following cmake command: :

cmake -D FEATURE_SET_NAME_FORCE=fivegram .

Then proceed to build the

detcurve

target as before: :

make detcurve

This time, results will be stored in

experiments/fivegram

rather than

experiments/fourgramall

as previously.

Other parameters can be set, such as the normalization option for resultbuilder.pl, and the result name that determines where plot output is stored: :

cmake -D FEATURE_SET_NAME_FORCE=fivegram -D RBLDR_NORM_FORCE=1 -D RESULT_NAME_FORCE=nist_norm1 .

<br/> This prepares the system for running

fivegram

with a normalization of 1. It will also set the result name to be

nist_norm1

to differentiate this run from our previous one. We can now run the experiment: :

make detcurve

The parameters we set in our most recent run of cmake are the exact parameters used by the regression test. Speaking of the regression test…. <segue>

Running All Experiments

If you wish to build all defined experiments, simply run

Language-ID/scripts/[[runall.rb]]

.

Regression Tests

Regression tests have been created to help verify the integrity of any changes we make to the system. The regression tests can be run by invoking

Language-ID/scripts/[[run_all_tests.rb]]

from within the

HEAD

directory. This will attempt to compare all currently-built experiments to the baseline data stored in

Language-ID/regression

to guarantee no significant changes to the output.

Papers

The following papers provide vital background information to the problem that the Spoken Language ID project is tackling:

Spoken Language ID

nlp-private/introduction-to-language-id.txt · Last modified: 2015/04/22 21:09 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0