The overviews/tutorials
The MIT approach
(very readable paper; one language-independent front-end phoneme recognizer like we're using, but their own (SUMMIT); four stages: preprocessor (14 MFCC + 14 delta MFCC's, F0 and delta F0), then their phonetic recognizer (87 phones), then language identifier (phonetic acoustic model + phonetic language model, trigram), then prosodic model (F0 and delta F0 + segment duration model); 10 sec.+ language model is more important; < 10 sec. phonotactics are more valuable)
The OGI approach
(uses CSLU toolkit; Eng. front-end recog (SEGLOLA in 1994, phonemes in 1996; features are 12-order LPC (normalized energy) and delta cepstra (delta energy); results are scored via forward/backward + duration model; then final classifier (neural net); good discussion of other features (silence, filled pauses, etc.))
Berkling, K. M. and Barnard, E. (1994). Language identification of six languages based on a common set of broad phonemes. Proc. of the 1994 International Conference on Spoken Language Processing, Vol. 4, pp. 1891-1894, Yokohama, Japan.
://citeseer.ist.psu.edu/berkling94language.html
Y. Yan, “Development of an approach to language identification based on language-dependent phone recognition, ” Oregon Graduate Institute os Science and Technology, Dissertation October 1995.
://citeseer.ist.psu.edu/yan95development.html
Y. Yan and E. Barnard. A comparison of neural net and linear classifier as the pattern recognizer in automatic language identification. In International Conference on Neural Networks and Signal Processing (ICNNSP95), page To Appear, Nanjing, P.R.China, December, 1995.
://citeseer.ist.psu.edu/yan95comparison.html
Y. Yan and E. Barnard, “An Approach to Automatic Language Identification Based on Language-dependent Phone Recognition,” In ICASSP '95, Detroit, Michigan, 1995.
Y. Yan, E. Barnard, “An Approach to Language Identification with Enhanced Language Model” Eurospeech `95, Madrid, pp. 1351-1354
Y. Yan and E. Barnard. Recent improvements to a phonotactic approach to language identification. In the Fifteenth Annual Speech Research Symposium XV, pages 212–219, Baltimore, Maryland, June, 1995.
Y. Yan and E. Barnard, “Experiments for an approach to language identification with conversational telephone speech,” in ICASSP '96 Proceedings, May 1996, vol. 2, pp. 789–792.
://citeseer.ist.psu.edu/55489.html
RUN TIME INFORMATION FUSION IN SPEECH RECOGNITION Chengyi Zheng1, Yonghong Yan1, 2 1Computer Science & Engineering Department, OGI School of Science & Engineering, Oregon Health & Science University, 20000 NW Walker Rd., Beaverton, OR 97006, USA 2Institute of Acoustics, Chinese Academy of Science, Beijing 100080, P.R. China chengyi@cse.ogi.edu, yan@cse.ogi.edu
://hccl.ioa.ac.cn/pdf/Run-Time-Fusion_ICSLP2002.pdf
The INESC/IST approach
(uses SPEECHDAT corpus, does 6 Eurpoean languages, 3 stage process very similar to ours: language-dependent front end phoneme recognizer (in their case Portuguese), stream-specific lang modeling (phone bigrams), maxent classifier)
C. M. Ribeiro and I. M. Trancoso, “Phonetic vocoding with speaker adaptation,” in Proc. EUROSPEECH-97, 1997, pp.1291–1294.
D. Caseiro, I. Trancoso: Spoken language identification using the SpeechDat corpus, Proc. ICSLP 98, pp. 31973200.
Spoken Language Identification Using The Speechdat Corpus Diamantino Caseiro, Isabel Trancoso Inesc/Ist Inesc, Rua Alves Redol, n9,…
LREBIB.rtf doc. found on Eric's PC
General/overall
@misc{ muthusamyy-automatic,
author = "Yeshwant Muthusamyy",
title = "Automatic Language Identification: A Review/Tutorial",
url = "citeseer.ist.psu.edu/4779.html" }
(DWL: corrected and entered)
Prosody/suprasegmentals
@misc{ cummins99language,
author = "F. Cummins and F. Gers and J. Schmidhuber",
title = "Language identification from prosody without explicit features",
text = "Fred Cummins, Felix Gers, and Jurgen Schmidhuber. Language identification
from prosody without explicit features. In Proceedings of EUROSPEECH99,
1999. To appear.",
year = "1999",
url = "citeseer.ist.psu.edu/cummins99language.html" }
@misc{ fu95survey,
author = "S. Fu and C. Lee and O. Clubb",
title = "A Survey on Chinese Speech Recognition",
text = "Stephen W. K. Fu, C. H. Lee, Orville L. Clubb, A Survey on Chinese Speech Recognition, 23 November, 1995.", year = "1995",
url = "citeseer.ist.psu.edu/fu96survey.html" ,
note = "prosody, suprasegmentals"
}
@misc{ wang-lexical,
author = "Chao Wang and Stephanie Seneff",
title = "Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain",
url = "citeseer.ist.psu.edu/453327.html",
note = "prosody, suprasegmentals"
}
Imoto, Kazunori / Dantsuji, Masatake / Kawahara, Tatsuya (2000): “Modelling of the perception of English sentence stress for computer-assisted language learning”, In ICSLP-2000, vol.3, 175-178.
Jenkin, K. L. & Scordilis, M. S. (1996), 'Development and comparison of three syllable stress classifiers', in Proceedings of the International Conference on Spoken Language Processing, Philadelphia, USA, pp. 733–736.
Thubthong, N., Kijsirikul, B. and Pusittrakul, A. 2002. A Thubthong, N., Kijsirikul, B. and Pusittrakul, A. 2002. A method for isolated Thai tone recognition using combination of neural networks. Computational Intelligence. 18(3):313-335.
Thubthong, N., Kijsirikul, B. and Pusittrakul, A. 2002. A method for isolated Thai tone recognition using combination of neural networks. Computational Intelligence. 18(3):313-335.
How Much Prosody Can You Learn from Twenty Utterances?
Author: Keller, Eric; Zellner Keller, Brigitte
Journal: Linguistik online
Issn: 16153014
Year: 2003
Volume: 17
Issue: 5
Pages/rec. No.: 57-79
Key words: learning; teaching; computational linguistics
report_number = IDSIA - 07 - 99
date =1999, March, 03
author = Cummins Fred, Gers Felix, Schmidhuber Juergen
title = Comparing Prosody Across Many Languages
INEC/IST
D. Caseiro and I. Trancoso, “Identification of spoken european languages,” in Proc. of the IX European Signal Processing Conference (EUSIPCO-98), September 1998.
@misc{ isabel-spoken,
author = "Diamantino Caseiro Isabel",
title = "Spoken Language Identification Using The Speechdat Corpus",
url = "citeseer.ist.psu.edu/359897.html" }
@misc{ ribeiro-phonetic,
author = "Carlos M. Ribeiro and Isabel M. Trancoso and Diamantino A. Caseiro",
title = "Phonetic Vocoder Assessment",
url = "citeseer.ist.psu.edu/501710.html" }
@misc{ caseiro-language,
author = "Diamantino Caseiro and Isabel Trancoso",
title = "Language Identification Using Minimum Linguistic Information",
url = "citeseer.ist.psu.edu/137877.html" }
OGI
Zheng, Chengyi / Yan, Yonghong (2002): “Run time information fusion in speech recognition”, In ICSLP-2002, 1077-1080.
Y. Yan and E. Barnard, “Experiments for an approach to language identification with conversational telephone speech,” in ICASSP '96 Proceedings, May 1996, vol. 2, pp. 789–792.
http://citeseer.ist.psu.edu/55489.html
MIT
Timothy J. Hazen and Victor W. Zue, “Recent improvements in an approach to segment-based automatic language identification,” In Proceedings of the International Conference on Spoken Language Processing, pp. 1883-1886, Yokohama, September, 1994.
Corpora
@misc{ schultz02globalphone,
author = "T. Schultz",
title = "Globalphone: a Multilingual Speech and Text Database Developed at Karlsruhe
University",
text = "Tanja Schultz, Globalphone: a Multilingual Speech and Text Database Developed at Karlsruhe University, in Proceedings of the ICSLP, Denver, Colorado, USA, September 2002.",
year = "2002",
url = "citeseer.ist.psu.edu/schultz02globalphone.html" }
Singapore SDP
Rong Tong, Bin Ma, Donglai Zhu, Haizhou Li and Eng Siong Chng, “Integrating Acoustic, Prosodic and Phonotactic features for Spoken language identification”, IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP2006), May 14-19, 2006, Toulouse, France
Santhosh C. Kumar, V.P. Mohandas and Haizhou Li, “Multilingual Speech Recognition: A Unified Approach”, InterSpeech 2005 - Eurospeech - 9th European Conference on Speech Communication and Technology, September 4-8, 2005, Lisboa, Portugal.
Bin Ma, Haizhou Li and Chin-Hui Lee, “An Acoustic Segment Modeling Approach to Automatic Language Identification”, InterSpeech 2005 - Eurospeech - 9th European Conference on Speech Communication and Technology, September 4-8, 2005, Lisboa, Portugal.
Boon Pang LIM, Haizhou Li, and Bin Ma, “Using Local and Global Phonotactical Features in Chinese Dialect Identification”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), March 2005, Philadelphia, PA, USA.
Boon Pang Lim, Haizhou Li, and Yu Chen, “Language Identification through Large Vocabulary Continuous Speech Recognition” in proc. International Symposium on Chinese Spoken Language Processing (ISCSLP2004), Hong Kong, Dec 2004.
Bin Ma, Cuntai Guan, Haizhou Li and Chin-Hui Lee, “Multilingual Speech Recognition with Language Identification”, International Conference on Spoken Language Processing (ICSLP), DENVER-COLORADO, Sept. 16-20, 2002
Ensemble Methods
Back to top