Status of normalization in Language ID as of 17 July 2008:

As far as I know, normalization is still there and working. It's configurable using the rbldr_norm property in .lidconfig files (i.e. cmake knows about it) and thus should be able to be set at the command line using


and then the number indicating the type of normalization you want. (Introduction to Language ID talks about this). I've just never really used it very much, even though I know it improves performance. The regression testing infrastructure also supports tracking what type of normalization was used to generate a result. I actually did a java reimplementation of the normalization code inside of edu.byu.nlp.experimentation.results.AbstractIdentificationResult.normalize, but at the moment it's disabled because I wasn't sure it was doing the right thing. –Josh 18:28, 17 July 2008 (MDT)

Spoken Language ID

nlp-private/normalization.txt · Last modified: 2015/04/22 21:20
