Top-Level Goals

  • Scenario #1:
    • Starting point: True phonetic (or broad-class phone) transcripts and audio files
    • Focus: Feature engineering on transcripts and audio files
  • Scenario #2:
    • Starting point: Hypothesis phonetic transcripts (from another research group) and audio files
    • Focus: Feature engineering on transcripts and audio files

Other Directions for Research

  • Scenario #3:
    • Starting point: audio files
    • Focus: Reliable broad-class phone recognition for languages having broad-class phone LMs (trained by supervised learning)
  • Scenario #4:
    • Starting point: audio files
    • Focus: Reliable broad-class phone recognition for all languages, where broad-class phone LMs are trained by semi-supervised learning


Possible Paper

  • What is the latest system that performs best when based on phonetic transcription as an intermediate representation
  • Contact the builders of the baseline system to ask if this line of research would be interesting to them (see below)
  • Implement this technique as a strong baseline on true transcripts
  • Beat it using feature engineering, ensembles, maxent, etc.
  • Part 2: grab real data from somebody else

Action List


  • Get things working under Cygwin: this will wait until Robbie's new detware implementation gets committed.
    • Actually, the current blocker seems to be Cygwin's interaction with praat, which is not a cygwin app
  • Reengineer - can praat be run by the make system, and the be run on the output of praat?
  • Add comments to cmakelists
  • Reproduce Pedro's and Bruce's results – Using Feature Engineering Console
    • re-run best experiments from Pedro's and Bruce's experiments with an eye specifically on the impact on (for example) Mandarin performance.
    • Try comparing Bruce's results to a simpler baseline rather than comparing them to than Pedro's results (try Trigram, 5gram)
  • <s>Reconcile Pedro's SE_*.def.xml files with other files in the repository. Keep only the unique ones.</s> - presumably completely DONE in r378
    • <s>Purge the duplicates.</s> - DONE in r245 and r252
  • Feature engineering on both pitch and F0.
    • <s>with quantization (may need different quantiles for each approach)</s> - DONE in r382
    • <s>with linear regression</s> - DONE in r375 and MERGED TO HEAD in r382
    • with _quadratic_ regression - Quadratic regression for pitch checked in in r402
  • <s>Parallelize to take advantage of multiple processors/processor cores.</s> - DONE in r388, though a bug remains and it isn't currently enabled.
  • Revamp the feature definition system according to Feature Definition XML File Roadmap
    • Domain Specific Language for SpokenLID Feature Definitions
  • Robbie: provide Eric with final normalization proof paper for tech report.

Feature Engineering Console Module

Tasks related to the Language-ID implementation of the Edu.byu.nlp.experimentation API, located at Language-ID-Experimentation-Module

  • Can we enable multiple simultaneous jobs in cmake? What sort of locking will this require? (slx2 and ling files, etc.)
  • Show accurate durations for the wav files. This will be accomplished somewhere in or around the SLIDTrial class's getOtherInfo method. The durations are taken from the relevant .result file, so perhaps they're hardcoded in
  • Wider legend colors
  • Tag or label on the chart to identify which language is which
  • <s>Outcome isn't showing up in trial list</s> - FIXED
  • Check fivegram for regression?
  • Check title of file feature weights?
  • Outline view of features and experiments?
  • 'done' or 'status' file to show where an experiment run terminated

Feature Engineering with True Transcripts

  • All:


  • Robbie:
    • Log results of the full set of n-gram LM & maxent experiments on new .slx2 file set on the wiki.
    • MaxEnt Optimization: Start with feature weights from prior iteration
    • Replace NIST Detware with Own DET-curve software
    • In new DET-curve software automatically calculate the following:
      • the aggregate DCF
      • aggregate EER
      • aggregate Operating point
      • per-language statistics (DCF, EER, Op Pt)
    • Reconcile: EER in .csv result file and plot-*/global/eer.txt
    • Understand the relationship with plot-*/avgeer.txt
    • Sweep out one threshold per binary classifier. Equivalent to normalization?
    • Optimize theta sweep.


  • Eric:
    • investigate whether output from operating point selection on the training set is overwriting or being over-written during test time.
    • is the training sweep .csv file over-writing the test sweep .csv file?
  • Feature Selection:
    • Count-cut-offs with MaxEnt
    • Mutual information based feature selector
    • Berger's feature selection procedure with learning in the loop
  • In pitch and formant change features, Choose something other than min. > max (i.e., min. - max. = 0)
  • Investigate using continuous-valued features in MaxEnt. Would obviate need for quanitzation in the feature set.
    • See Franz Och's (short, obfuscated) implmentation of MaxEnt. available online.
  • Cite Chen & Maison - refer to text lang ID prior work.
  • Re-run n-gram experiments with Kneser-Ney instead of Katz-style Good-Turing
  • Re-examine n-gram vs. n-gram-all models for MaxEnt in wake of “null” bug removal
  • Implement multi-class classifier and track accuracy (in addition to the aggregate DCF, DET, etc. for the binary one-v-rest classifiers) – the decision is not blind to evidence for other languages.


  • Statistical significance
  • Incorporate training / test split in Makefile (with option to hold fixed or to re-split)
  • Cross-validation: general
  • Contact Audrew Le ( inquire about multi-dimensional (one theta per lang.) DETware.
  • Also ask Audrey about the normal deviate scale
  • Use answer key files, rather than reading the answer out of the filename
  • Debug: operating point is sometimes off the DET curve (e.g., Pedro's English v. Spanish curves)
  • Robbie: incorporate Richard Arthur's confusability matrix code into Spoken LID and 401R Codebase. Adapt for Maxent feature weights.

Speech Reco. with languages with broad-class phone LMs (trained by supervised learning)

  • Re: quality of the segment labels and the segment endpoints. 8/7/06: We noticed a trend in endpoint position discrepancy with the truth. Most egregiously, the final segment always had erroneous start-/end-time stamps. Debug this code, and take another look at a couple of utterances in order to see where we stand.


  • Phone reco. in Makefile
  • Optimization of SR parameters
  • NIST datasets: See Singer; OGI_TS, CallFriend
  • Verify that content of NIST 2005 dev. dataset <math>\supseteq</math> 2003 <math>\supseteq</math> 1996
  • Inventory of resources at our disposal that we can use to train phone LMs. This can be done either by (in order of preference): (a) using the OGI-generated phoneme annotations directly; (b) directly using phoneme-level (or “close”) transcriptions and converting them to phoneme classes;© using straight text, using text-to-speech tools (aka phonemicizers) to convert the text to phonemic form, and then convert the phonemes to phoneme classes.

SR with all languages, where LMs are trained by semi-supervised learning

  • Semi-supervised LM training


  • 4: Idea from discussion with Hal: hierarchical linguistic similarity – leverage similarity to pool data and try to get better lang id. rates. Combine with error-correcting code-style multi-class classification.
  • 5: Add Farsi phoneme-aligned data

Low Priority

  • Split makefile: MaxEnt v. n-gram
  • Subversion repository organization
  • Globally rename SEGLOLA to derivative of Broad-class phones (BRDCLSPHONES???)
    • Move scripts down a level
    • Can Eclipse still checkout projects how we want??
    • 3 separate repositories (one per Eclipse project)?
  • Try other pitch tracker from Mark Lieberman
  • Try ESPS (currently stored as tar ball in entropy:/home/tools )
  • Consider Hal's maxent toolkit. Use for multi-class classification.


  • The project can't be run from cygwin in a home directory (has to be run from a mapped drive on entropy). Here's the error:
Processing /home/pep6/workspace/experiments/data/seg/en013num.seg ..
CMD1: c:/Program\ Files/Praat/praatcon.exe Language-ID/scripts/getpitch3.praat /home/pep6/workspace/
Error: Cannot open file "C:\cygwin\home\pep6\workspace\Language-ID\scripts\/home/pep6/workspace/expe
No object was put into the list.

Paper Ideas

  • Berger's feature selection loop: comparing feature selection with feature template selection. Need to aggregate usefulness of indiv. features to characterize the usefulness of the template from which they were instantiated.


  • Bruce: Quadratic regression on pitch tracks

Spoken Language ID

nlp-private/spoken-language-id.txt · Last modified: 2015/04/22 21:16 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0