The Spoken Language ID project seeks to….

### Running An Experiment in Ten Shell Commands Or Less

1. Ensure that your system meets the current system requirements. On an Ubuntu Linux system, this can be done using a command such as this:<br/>
sudo apt-get install sun-java6-jdk ant perl ruby gnuplot gcc cmake make pdl subversion
2. Check out the current Language ID repository:<br/>
svn co <nowiki>http://nlp.cs.byu.edu/subversion/NIST/HEAD</nowiki>
3. Enter the HEAD directory created by the previous command:<br/>
cd HEAD
4. Be sure that the template from which the main configuration file will be generated represents your setup by editing
Language-ID/config/language-id.conf.cmake

. Do this using your favorite text editor, for example:

vim Language-ID/config/language-id.conf.cmake

. Pay particular attention to

PRAAT_EXE

,

PERL_EXE

,

WAV_DATA_ORIGINAL_LOCATION

, and

SEG_DATA_ORIGINAL_LOCATION

(or their

*_WIN

counterparts if running on Windows) to be sure that these contain the correct paths. (

LABTOOLS

seems to be unused at the moment.)

5. Generate the build system with default settings using cmake:<br/>
cmake .

<br/>This also generates the configuration file

Language-ID/config/language-id.conf

based on the settings you provided in

Language-ID/config/language-id.conf.cmake

.

6. Build DETware:<br/>
make
7. Build Language-ID:<br/>
cd Language-ID<br/>ant
8. Run the experiment:<br/>
cd ..<br/>make detcurve

<br/>The

detcurve

target will create the

experiments

directory in which all experiment data will be stored. It will then copy seg files and wav files into

experiments/data

and begin extracting features from these data files by running the seg2xml3.pl script.<br/>Once these preliminary data have been copied and analyzed, a directory specific to the current experiment will be created. As the default experiment is

fourgramall

, a directory called

experiments/fourgramall

will be created to house results specific to that experiment. ling files will be generated and placed in

experiments/fourgramall/data

. Language models will be trained and placed in

experiments/fourgramall/models

. Other results, including metrics and plot data, will be placed in

experiments/fougramall/results
9. Explore the DET curves, avgcost.txt, avgeer.txt, etc. in the
results

directory.

### Running a Specific Experiment

The above example only runs the default 'fourgramall' experiment. Running a specific experiment requires almost exactly the same process. For example, to run the 'fivegram' experiment, substitute the following cmake command: :

cmake -D FEATURE_SET_NAME_FORCE=fivegram .

Then proceed to build the

detcurve

target as before: :

make detcurve

This time, results will be stored in

experiments/fivegram

rather than

experiments/fourgramall

as previously.

Other parameters can be set, such as the normalization option for resultbuilder.pl, and the result name that determines where plot output is stored: :

cmake -D FEATURE_SET_NAME_FORCE=fivegram -D RBLDR_NORM_FORCE=1 -D RESULT_NAME_FORCE=nist_norm1 .

<br/> This prepares the system for running

fivegram

with a normalization of 1. It will also set the result name to be

nist_norm1

to differentiate this run from our previous one. We can now run the experiment: :

make detcurve

The parameters we set in our most recent run of cmake are the exact parameters used by the regression test. Speaking of the regression test…. <segue>

### Running All Experiments

If you wish to build all defined experiments, simply run

Language-ID/scripts/[[runall.rb]]

.

### Regression Tests

Regression tests have been created to help verify the integrity of any changes we make to the system. The regression tests can be run by invoking

Language-ID/scripts/[[run_all_tests.rb]]

from within the

HEAD

directory. This will attempt to compare all currently-built experiments to the baseline data stored in

Language-ID/regression

to guarantee no significant changes to the output.

## Papers

The following papers provide vital background information to the problem that the Spoken Language ID project is tackling: