nlp-private:scratch-bibliography [CS Wiki]

The overviews/tutorials

The MIT approach

Timothy J. Hazen and Victor W. Zue. Segment-based automatic language identification. Journal of the Acoustical Society of America, 101(4):2323–2331, 1997. ://citeseer.ist.psu.edu/hazen97segmentbased.html

(very readable paper; one language-independent front-end phoneme recognizer like we're using, but their own (SUMMIT); four stages: preprocessor (14 MFCC + 14 delta MFCC's, F0 and delta F0), then their phonetic recognizer (87 phones), then language identifier (phonetic acoustic model + phonetic language model, trigram), then prosodic model (F0 and delta F0 + segment duration model); 10 sec.+ language model is more important; < 10 sec. phonotactics are more valuable)

The OGI approach

(uses CSLU toolkit; Eng. front-end recog (SEGLOLA in 1994, phonemes in 1996; features are 12-order LPC (normalized energy) and delta cepstra (delta energy); results are scored via forward/backward + duration model; then final classifier (neural net); good discussion of other features (silence, filled pauses, etc.))

Berkling, K. M. and Barnard, E. (1994). Language identification of six languages based on a common set of broad phonemes. Proc. of the 1994 International Conference on Spoken Language Processing, Vol. 4, pp. 1891-1894, Yokohama, Japan. ://citeseer.ist.psu.edu/berkling94language.html

Y. Yan, “Development of an approach to language identification based on language-dependent phone recognition, ” Oregon Graduate Institute os Science and Technology, Dissertation October 1995. ://citeseer.ist.psu.edu/yan95development.html

Y. Yan and E. Barnard. A comparison of neural net and linear classifier as the pattern recognizer in automatic language identification. In International Conference on Neural Networks and Signal Processing (ICNNSP95), page To Appear, Nanjing, P.R.China, December, 1995. ://citeseer.ist.psu.edu/yan95comparison.html

Y. Yan and E. Barnard, “An Approach to Automatic Language Identification Based on Language-dependent Phone Recognition,” In ICASSP '95, Detroit, Michigan, 1995.

Y. Yan, E. Barnard, “An Approach to Language Identification with Enhanced Language Model” Eurospeech `95, Madrid, pp. 1351-1354

Y. Yan and E. Barnard. Recent improvements to a phonotactic approach to language identification. In the Fifteenth Annual Speech Research Symposium XV, pages 212–219, Baltimore, Maryland, June, 1995.

Y. Yan and E. Barnard, “Experiments for an approach to language identification with conversational telephone speech,” in ICASSP '96 Proceedings, May 1996, vol. 2, pp. 789–792. ://citeseer.ist.psu.edu/55489.html

RUN TIME INFORMATION FUSION IN SPEECH RECOGNITION Chengyi Zheng1, Yonghong Yan1, 2 1Computer Science & Engineering Department, OGI School of Science & Engineering, Oregon Health & Science University, 20000 NW Walker Rd., Beaverton, OR 97006, USA 2Institute of Acoustics, Chinese Academy of Science, Beijing 100080, P.R. China chengyi@cse.ogi.edu, yan@cse.ogi.edu ://hccl.ioa.ac.cn/pdf/Run-Time-Fusion_ICSLP2002.pdf

The INESC/IST approach

(uses SPEECHDAT corpus, does 6 Eurpoean languages, 3 stage process very similar to ours: language-dependent front end phoneme recognizer (in their case Portuguese), stream-specific lang modeling (phone bigrams), maxent classifier)

C. M. Ribeiro and I. M. Trancoso, “Phonetic vocoding with speaker adaptation,” in Proc. EUROSPEECH-97, 1997, pp.1291–1294.

D. Caseiro, I. Trancoso: Spoken language identification using the SpeechDat corpus, Proc. ICSLP 98, pp. 31973200.

Spoken Language Identification Using The Speechdat Corpus Diamantino Caseiro, Isabel Trancoso Inesc/Ist Inesc, Rua Alves Redol, n9,…

Phonetic Vocoder Assessment Carlos M. Ribeiro, Isabel M. Trancoso, Diamantino A. Caseiro

Language Identification Using Minimum Linguistic Information Diamantino Caseiro, Isabel Trancoso

LREBIB.rtf doc. found on Eric's PC

General/overall

Bibliography at: ://speech.inesc.pt/~dcaseiro/html/bibliografia.html

 @misc{ muthusamyy-automatic,
 author = "Yeshwant Muthusamyy",
 title = "Automatic Language Identification: A Review/Tutorial",
 url = "citeseer.ist.psu.edu/4779.html" }

(DWL: corrected and entered)

Prosody/suprasegmentals

 @misc{ cummins99language,
 author = "F. Cummins and F. Gers and J. Schmidhuber",
 title = "Language identification from prosody without explicit features",
 text = "Fred Cummins, Felix Gers, and Jurgen Schmidhuber. Language identification
   from prosody without explicit features. In Proceedings of EUROSPEECH99,
   1999. To appear.",
 year = "1999",
 url = "citeseer.ist.psu.edu/cummins99language.html" }

 @misc{ fu95survey, 
 author = "S. Fu and C. Lee and O. Clubb", 
 title = "A Survey on Chinese Speech Recognition", 
 text = "Stephen W. K. Fu, C. H. Lee, Orville L. Clubb, A Survey on Chinese Speech Recognition, 23 November, 1995.", year = "1995", 
 url = "citeseer.ist.psu.edu/fu96survey.html" ,
 note = "prosody, suprasegmentals"
 }

 @misc{ wang-lexical, 
 author = "Chao Wang and Stephanie Seneff", 
 title = "Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain", 
 url = "citeseer.ist.psu.edu/453327.html",
 note = "prosody, suprasegmentals"
 }

Imoto, Kazunori / Dantsuji, Masatake / Kawahara, Tatsuya (2000): “Modelling of the perception of English sentence stress for computer-assisted language learning”, In ICSLP-2000, vol.3, 175-178.

Jenkin, K. L. & Scordilis, M. S. (1996), 'Development and comparison of three syllable stress classifiers', in Proceedings of the International Conference on Spoken Language Processing, Philadelphia, USA, pp. 733–736.

Thubthong, N., Kijsirikul, B. and Pusittrakul, A. 2002. A Thubthong, N., Kijsirikul, B. and Pusittrakul, A. 2002. A method for isolated Thai tone recognition using combination of neural networks. Computational Intelligence. 18(3):313-335.

Thubthong, N., Kijsirikul, B. and Pusittrakul, A. 2002. A method for isolated Thai tone recognition using combination of neural networks. Computational Intelligence. 18(3):313-335.

 How Much Prosody Can You Learn from Twenty Utterances? 
 Author:  Keller, Eric; Zellner Keller, Brigitte 
 Journal:  Linguistik online 
 Issn:  16153014 
 Year:  2003 
 Volume:  17 
 Issue:  5 
 Pages/rec. No.:  57-79 
 Key words:  learning; teaching; computational linguistics

 report_number = IDSIA - 07 - 99 
 date =1999, March, 03 
 author =  Cummins Fred, Gers Felix, Schmidhuber Juergen 
 title = Comparing Prosody Across Many Languages

Miranda, E. R. Automatic Sound Identification based on Prosodic Listening. Proceedings of 17th International Congress on Acoustics, September 2001. Universita di Roma La Sapienza

INEC/IST

D. Caseiro and I. Trancoso, “Identification of spoken european languages,” in Proc. of the IX European Signal Processing Conference (EUSIPCO-98), September 1998.

 @misc{ isabel-spoken,
 author = "Diamantino Caseiro Isabel",
 title = "Spoken Language Identification Using The Speechdat Corpus",
 url = "citeseer.ist.psu.edu/359897.html" }

 @misc{ ribeiro-phonetic,
 author = "Carlos M. Ribeiro and Isabel M. Trancoso and Diamantino A. Caseiro",
 title = "Phonetic Vocoder Assessment",
 url = "citeseer.ist.psu.edu/501710.html" }

 @misc{ caseiro-language,
 author = "Diamantino Caseiro and Isabel Trancoso",
 title = "Language Identification Using Minimum Linguistic Information",
 url = "citeseer.ist.psu.edu/137877.html" }

OGI

Zheng, Chengyi / Yan, Yonghong (2002): “Run time information fusion in speech recognition”, In ICSLP-2002, 1077-1080.

Kay M. Berkling and Etienne Barnard. Language identification of six languages based on a common set of broad phonemes. In ICSLP [ICS94], pages 1891–1894.

Y. Yan and E. Barnard, “Experiments for an approach to language identification with conversational telephone speech,” in ICASSP '96 Proceedings, May 1996, vol. 2, pp. 789–792. http://citeseer.ist.psu.edu/55489.html

MIT

Timothy J. Hazen and Victor W. Zue. Segment-based automatic language identification. Journal of the Acoustical Society of America, 101(4):2323–2331, 1997. http://citeseer.ist.psu.edu/article/hazen97segmentbased.html

Timothy J. Hazen and Victor W. Zue, “Recent improvements in an approach to segment-based automatic language identification,” In Proceedings of the International Conference on Spoken Language Processing, pp. 1883-1886, Yokohama, September, 1994.

Corpora

 @misc{ schultz02globalphone,
 author = "T. Schultz",
 title = "Globalphone: a Multilingual Speech and Text Database Developed at Karlsruhe
   University",
 text = "Tanja Schultz, Globalphone: a Multilingual Speech and Text Database Developed at Karlsruhe University, in Proceedings of the ICSLP, Denver, Colorado, USA, September 2002.",
 year = "2002",
 url = "citeseer.ist.psu.edu/schultz02globalphone.html" }

Singapore SDP

Rong Tong, Bin Ma, Donglai Zhu, Haizhou Li and Eng Siong Chng, “Integrating Acoustic, Prosodic and Phonotactic features for Spoken language identification”, IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP2006), May 14-19, 2006, Toulouse, France

Santhosh C. Kumar, V.P. Mohandas and Haizhou Li, “Multilingual Speech Recognition: A Unified Approach”, InterSpeech 2005 - Eurospeech - 9th European Conference on Speech Communication and Technology, September 4-8, 2005, Lisboa, Portugal.

Bin Ma, Haizhou Li and Chin-Hui Lee, “An Acoustic Segment Modeling Approach to Automatic Language Identification”, InterSpeech 2005 - Eurospeech - 9th European Conference on Speech Communication and Technology, September 4-8, 2005, Lisboa, Portugal.

Boon Pang LIM, Haizhou Li, and Bin Ma, “Using Local and Global Phonotactical Features in Chinese Dialect Identification”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), March 2005, Philadelphia, PA, USA.

Boon Pang Lim, Haizhou Li, and Yu Chen, “Language Identification through Large Vocabulary Continuous Speech Recognition” in proc. International Symposium on Chinese Spoken Language Processing (ISCSLP2004), Hong Kong, Dec 2004.

Bin Ma, Cuntai Guan, Haizhou Li and Chin-Hui Lee, “Multilingual Speech Recognition with Language Identification”, International Conference on Spoken Language Processing (ICSLP), DENVER-COLORADO, Sept. 16-20, 2002

Ensemble Methods

D. Morrison, R. Wang, L. C. De Silva (2007), "Ensemble methods for spoken emotion recognition in call-centres" (PDF), Speech Communication Vol. 49 (2007) pp. 98–112, Elsevier Science (Impact factor 1.178, SCI) http://dx.doi.org/10.1016/j.specom.2006.11.004

Spoken Language ID

nlp-private/scratch-bibliography.txt · Last modified: 2015/04/22 15:19 by ryancha

Back to top

Table of Contents

The overviews/tutorials

The MIT approach

The OGI approach

The INESC/IST approach

LREBIB.rtf doc. found on Eric's PC

General/overall

Prosody/suprasegmentals

INEC/IST

OGI

MIT

Corpora

Singapore SDP

Ensemble Methods