Conferences
Top-Level Goals
Other Directions for Research
Scenario #4:
Starting point: audio files
Focus: Reliable broad-class phone recognition for all languages, where broad-class phone LMs are trained by semi-supervised learning
Resources
Possible Paper
What is the latest system that performs best when based on phonetic transcription as an intermediate representation
Contact the builders of the baseline system to ask if this line of research would be interesting to them (see below)
Implement this technique as a strong baseline on true transcripts
Beat it using feature engineering, ensembles, maxent, etc.
Part 2: grab real data from somebody else
Action List
General
Get things working under Cygwin: this will wait until Robbie's new detware implementation gets committed.
Reengineer seg2xml3.pl - can praat be run by the make system, and the seg2xml3.pl be run on the output of praat?
Add comments to cmakelists
Reproduce
Pedro's and
Bruce's results –
Using Feature Engineering Console
re-run best experiments from Pedro's and Bruce's experiments with an eye specifically on the impact on (for example) Mandarin performance.
Try comparing Bruce's results to a simpler baseline rather than comparing them to than Pedro's results (try Trigram, 5gram)
<s>Reconcile Pedro's SE_*.def.xml files with other files in the repository. Keep only the unique ones.</s> - presumably completely DONE in
r378
<s>Purge the duplicates.</s> - DONE in
r245 and
r252
Feature engineering on both pitch and F0.
<s>with quantization (may need different quantiles for each approach)</s> - DONE in
r382
<s>with linear regression</s> - DONE in
r375 and MERGED TO HEAD in
r382
with _quadratic_ regression - Quadratic regression for pitch checked in in
r402
<s>Parallelize
seg2xml3.pl to take advantage of multiple processors/processor cores.</s> - DONE in
r388, though a bug remains and it isn't currently enabled.
-
Robbie: provide Eric with final normalization proof paper for tech report.
Feature Engineering Console Module
Tasks related to the Language-ID implementation of the Edu.byu.nlp.experimentation API, located at Language-ID-Experimentation-Module
Can we enable multiple simultaneous jobs in cmake? What sort of locking will this require? (slx2 and ling files, etc.)
Show accurate durations for the wav files. This will be accomplished somewhere in or around the SLIDTrial class's getOtherInfo method. The durations are taken from the relevant .result file, so perhaps they're hardcoded in resultbuilder.pl?
Wider legend colors
Tag or label on the chart to identify which language is which
<s>Outcome isn't showing up in trial list</s> - FIXED
Check fivegram for regression?
Check title of file feature weights?
Outline view of features and experiments?
'done' or 'status' file to show where an experiment run terminated
Feature Engineering with True Transcripts
<br>
Robbie:
Log results of the full set of n-gram LM & maxent experiments on new .slx2 file set on the wiki.
MaxEnt Optimization: Start with feature weights from prior iteration
Replace NIST Detware with Own DET-curve software
In new DET-curve software automatically calculate the following:
the aggregate DCF
aggregate EER
aggregate Operating point
per-language statistics (DCF, EER, Op Pt)
Reconcile: EER in .csv result file and plot-*/global/eer.txt
Understand the relationship with plot-*/avgeer.txt
Sweep out one threshold per binary classifier. Equivalent to normalization?
Optimize theta sweep.
<br>
Eric:
investigate whether output from operating point selection on the training set is overwriting or being over-written during test time.
is the training sweep .csv file over-writing the test sweep .csv file?
Feature Selection:
Count-cut-offs with MaxEnt
Mutual information based feature selector
Berger's feature selection procedure with learning in the loop
Cite Chen & Maison - refer to text lang ID prior work.
Re-run n-gram experiments with Kneser-Ney instead of Katz-style Good-Turing
Re-examine n-gram vs. n-gram-all models for MaxEnt in wake of “null” bug removal
Implement multi-class classifier and track accuracy (in addition to the aggregate DCF, DET, etc. for the binary one-v-rest classifiers) – the decision is not blind to evidence for other languages.
<br>
Contact Audrew Le (audrey.le@nist.gov): inquire about multi-dimensional (one theta per lang.) DETware.
Also ask Audrey about the normal deviate scale
Use answer key files, rather than reading the answer out of the filename
Debug: operating point is sometimes off the DET curve (e.g., Pedro's English v. Spanish curves)
Robbie: incorporate Richard Arthur's confusability matrix code into Spoken LID and 401R Codebase. Adapt for Maxent feature weights.
Speech Reco. with languages with broad-class phone LMs (trained by supervised learning)
-
Re: quality of the segment labels and the segment endpoints. 8/7/06: We noticed a trend in endpoint position discrepancy with the truth. Most egregiously, the final segment always had erroneous start-/end-time stamps. Debug this code, and take another look at a couple of utterances in order to see where we stand.
<br>
Phone reco. in Makefile
Optimization of SR parameters
NIST datasets: See Singer; OGI_TS, CallFriend
Verify that content of NIST 2005 dev. dataset <math>\supseteq</math> 2003 <math>\supseteq</math> 1996
Inventory of resources at our disposal that we can use to train phone LMs. This can be done either by (in order of preference): (a) using the OGI-generated phoneme annotations directly; (b) directly using phoneme-level (or “close”) transcriptions and converting them to phoneme classes;© using straight text, using text-to-speech tools (aka phonemicizers) to convert the text to phonemic form, and then convert the phonemes to phoneme classes.
The original OGI corpus
://cslu.cse.ogi.edu/corpora/mlts/ which is annotated with broad-class phones for the following languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese.
The expanded OGI corpus
://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005S26 that has orthographic transcriptions only for some 19,758 utterances. Languages include: Arabic, Chinese, Czech, English, Farsi, German, Hindi, Hungarian, Italian, Japanese, Korean, Malay, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Tamil, Vietnamese.
-
-
SR with all languages, where LMs are trained by semi-supervised learning
Other
4: Idea from discussion with Hal: hierarchical linguistic similarity – leverage similarity to pool data and try to get better lang id. rates. Combine with error-correcting code-style multi-class classification.
5: Add Farsi phoneme-aligned data
Low Priority
Split makefile: MaxEnt v. n-gram
-
Subversion repository organization
Globally rename SEGLOLA to derivative of Broad-class phones (BRDCLSPHONES???)
Move scripts down a level
Can Eclipse still checkout projects how we want??
3 separate repositories (one per Eclipse project)?
Try other pitch tracker from Mark Lieberman
Try ESPS (currently stored as tar ball in entropy:/home/tools )
Consider Hal's maxent toolkit. Use for multi-class classification.
Bugs
Processing /home/pep6/workspace/experiments/data/seg/en013num.seg ..
CMD1: c:/Program\ Files/Praat/praatcon.exe Language-ID/scripts/getpitch3.praat /home/pep6/workspace/
experiments/data/wav/en013num.wav
Error: Cannot open file "C:\cygwin\home\pep6\workspace\Language-ID\scripts\/home/pep6/workspace/expe
riments/data/wav/en013num.wav".
No object was put into the list.
Paper Ideas
Done
Back to top