nlp-private:kevin [CS Wiki]

Kevin Cook

email: kevincook@keypadbrowser.com

phone: 801-785-3235

My research interests focus on speech processing. I am currently participating in Spoken Language ID and in PSST.

Log

13 May 2008

A subsystem is different from an SCC. An SCC classifies an instance as belonging to a target class or not belonging. Whereas a subsystem classifies an instance as belonging to the target class or that no classification decision is made by that subsystem. A one versus rest binary classifier classifies an instance as belonging to a target class or belonging to the non target class as represented by the negative training examples. The SCC, in contrast, does not use negative training examples and hence does not classify instances as belonging to a specific non target class but rather as simply not belonging to the target class.

The decision to decide or not decide is somewhat similar to what is known as three valued logic, true false or unknown.

5 May 2008

A subsystem which returns a boolean output of certain or uncertain is the same thing as a subsystem which returns a boolean output indicating whether sufficient evidence exists to support classification or not.

Such a subsystem is not to be confused with a binary classifier trained to detect a single language, like German, using positive and negative examples as is done in our SLID lab. A binary classifier can classify an instance as negative, which the subsystem cannot do. The subsystem classifies an instance as positive or refuses to classify it.

Such subsystems can be easily combined if their FA rates are zero or even if their FA rates are below some arbitrarily low threshold.

I have been searching the literature to try to find references to such a subsystem. I found some literature referring to single class classification (SCC) which may be very similar to what I am thinking about. One paper which may help me is Optimal Single-Class Classification Strategies by Yaniv and Nisenson, NIPS 2006.

I also looked at the file feature weights generated from various maximum entropy models in the SLID lab. I did notice evidence which could support the idea that ignoring negative evidence could improve performance. The justification for ignoring negative evidence is that the evidence is not truly negative. For example, English is a Germanic language and thus borrows many features from German. It may not be totally correct to identify a German instance as a negative example of English. It may be true that there is no such thing as a negative example of English, or any other language.

25 April 2008

Yes, I believe it is good to build a prototype, and I am planning to do so in the Spoken Language Identification domain. To that end I have been using the Feature Engineering Console and SLID software developed in the lab. These tools are somewhat broken at the present time and so efforts to build a prototype in this domain are slow. I am learning about extraction of cepstral features from voice and plan to use some version of those features since they appear to be most promising.

One reason for pursuing the domain of speech for this task is that speech has a much richer feature set than text. A richer feature set will do a better job of illustrating the benefits of the system.

However, I believe that building a prototype may not be the most important thing I could do right now. I believe I need to do a better job of defining Decision Set Union and why it is useful.

The significant point is not how subsystems are combined using Boolean operators but that subsystems return Boolean outputs indicating whether sufficient evidence exists to support classification or not.

3 April 2008

I decided not to pursue the idea of proposing a new evaluation metric at this time. I believe a better strategy is to build a prototype of the type of system proposed in my 5/24/07 entry to demonstrate the utility of that idea. I put together a presentation on this idea and called it 'Decision Set Union'.

The main idea behind the metric is that using it can promote better cooperation between classification systems, using the Decision Set Union approach to system combination. I think that pitching this metric will be less meaningful without being able to refer to an example of such a system. I also believe that the system itself is more important than the metric.

6 June 2007

I would like help in preparing a paper for publication. I would like to present a new evaluation metric for language ID which could also be used for ASR and a variety of other recognition tasks.

The current metric used by NIST for the 2007 language ID competition does not differentiate between misses and false alarms (FA). The two types of errors are weighted equally. In the paper I would argue that such a metric does not reflect the significant difference between these two types of errors and therefore misses the opportunity to encourage the development of more effective recognition systems.

I would like to propose the following metric:

Coverage = 100% ∙ (Hits / (Hits + Misses)) ∙ (1 / Validity)

Where Validity = 1 when FA = 0 and Validity = 0 when FA > 0

Note that Coverage is undefined if FA > 0. In other words, for Coverage to be valid no false alarms are allowed. Invalid Coverage is not equivalent to zero Coverage. This is somewhat counter-intuitive but purposeful.

I would then offer a type of theoretical proof and show how the metric would be useful in encouraging the development of more effective recognition systems.

I would then offer empirical evidence from the lab to demonstrate the theory.

24 May 2007

Getting familiar with language ID project. Would like to explore idea to improve performance by combining multiple recognition systems. Idea uses concept of certainty to combine systems efficiently.

Here is my thought:

Many recognition systems today seek to balance the need to reduce false alarms with the need to avoid misses by adjusting a decision threshold level. A system architecture is introduced which enables both these needs to be addressed separately so that reducing false alarms does not necessarily increase misses and so that reducing misses does not necessarily increase false alarms. The system architecture is based on certainty decisions.

The system architecture consists of a combination of subsystems. Each subsystem is an independent recognition process with its own recognition strategy and model data. Each subsystem may be considered a system with its own subsystems.

Subsystems are designed to return one of two values, certain or uncertain. An output of certain means that the subsystem has a high degree of certainty that a specific aspect of the input data has been recognized. An output of uncertain means that the subsystem is not certain.

Subsystems are designed to reduce false alarms, not to reduce misses. Misses are reduced on the system level by increasing the number and variety of subsystems. Misses are expected and easily handled. False alarms are not. False alarms are system noise, making recognition difficult. Misses are not noise and do not necessarily degrade overall system performance. Misses from one subsystem can be masked by detection from another subsystem. The number of subsystems can be multiplied because each subsystem is designed to not generate false alarms.

nlp-private/kevin.txt · Last modified: 2015/04/23 13:40 by ryancha

Back to top

Table of Contents