= Scratch space for joint project by Ringger & Giraud-Carrier = = Focus: MAHT = [[MAHT]] = Focus: TDT = == Links to possibly useful places and papers == * [http://www.itl.nist.gov/iaui/894.01/tests/tdt/2004/ NIST] (replace year by 1998-2003 in the URL to see details of the other 6 workshops) ** short-cut link to summary report: http://www.itl.nist.gov/iaui/894.01/tests/tdt/2004/papers/NIST-TDT2004.ppt * [http://projects.ldc.upenn.edu/TDT/ UPenn TDT Site] * [http://books.google.com/books?id=ToEdfqDepuUC&dq=topic+detection+and+tracking&pg=PP1&ots=aABotCgQe4&sig=9YYXJRBRqpRnw8iIjD0yC0rDTVs&hl=en&prev=http://www.google.com/search?client=safari&rls=en-us&q=topic+detection+and+tracking&ie=UTF-8&oe=UTF-8&sa=X&oi=print&ct=title&cad=one-book-with-thumbnail James Allan's Book on TDT] * [http://citeseer.ist.psu.edu/cache/papers/cs/22989/http:zSzzSzciir.cs.umass.eduzSzpubfileszSzir-137.pdf/allan98topic.pdf Topic detection and tracking pilot study: Final report] * [http://maroo.cs.umass.edu/pub/web/getpdf.php?id=497 Allan, J., Harding, S., Fisher, D., Bolivar, A., Guzman-Lara, S. and Amstutz, P. (2005). "Taking Topic Detection From Evaluation to Practice," CD Proceedings of the Thirty-Eighth Annual Hawaii International Conference on System Sciences (HICSS).] * [http://maroo.cs.umass.edu/pub/web/getpdf.php?id=537 Kumaran, G. and Allan, J., (2005). "Using Names and Topics for New Event Detection," Proceedings of Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing, pp. 121-128.] * DARPA TIDES Program (could not find any links) == Sub-problems == === Areas of interest in 2004 === * Story Segmentation (* not evaluated in 2004) * New Event Detection = First Story Detection ** System Goal: To detect the first story that discusses each topic * Link Detection ** System Goal: To detect whether a pair of stories discuss the same topic. * Topic Tracking ** System Goal: To detect stories that discuss the target topic, in multiple source streams *** Supervised Training: Given Nt samples stories that discuss a given target topic *** Testing: Find all subsequent stories that discuss the target topic ** Traditional (non-adaptive?) ** Supervised Adaptive Topic Tracking (* experimental task) *** System Goal: To detect stories that discuss the target topic when a human provides feedback to the system (System receives human judgment (on or off-topic) for every retrieved story) * Topic Detection ** Traditional (flat?) ** Hierarchical Topic Detection (* experimental task) *** System Goal: To detect topics in terms of the (clusters of) stories that discuss them == Metrics == * New Event Detection and Link Detection: ** Detection Cost ** Detection error tradeoff (DET) Curves ** Notes: *** Same as in spoken language identification! Identification is detection, as opposed to classification. * Supervised Adaptive Tracking ** (Normalized) Detection Cost ** Linear Utility Measure (a la TREC 2002 Filtering Track, per Robertson & Soboroff) * Hierarchical Topic Detection ** Weighted combination of Detection Cost and Travel Cost == Terminology, Acronyms == * An event: a specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences. * A topic: an event or activity, along with all directly related events and activities * A broadcast news story: a section of transcribed text with substantive information content and a unified topical focus == Papers to acquire == * Title:Detection As Multi-Topic Tracking.(Author abstract). ** Author(s):James Allan. ** Source:Information Retrieval 5.2 (April 2002): p139. ** Document Type:Magazine/Journal ** DOI:http://dx.doi.org/10.1023/A:1015793827697 ** Byline: James Allan (1) ** Keywords: topic detection and tracking (TDT); event-based information organization; information filtering; evaluation ** Abstract: The topic tracking task from TDT is a variant of information filtering tasks that focuses on event-based topics in streams of broadcast news. In this study, we compare tracking to another TDT task, detection, which has the goal of partitioning all arriving news into topics, regardless of whether the topics are of interest to anyone, and even when a new topic appears that had not been previous anticipated. There are clear relationships between the two tasks (under some assumptions, a "perfect" tracking system could "solve" the detection problem), but they are evaluated quite differently. We describe the two tasks and discuss their similarities. We show how viewing detection as a form of multi-topic parallel tracking can illuminate the performance tradeoffs of detection over tracking. ** Source Citation: Allan, James. "Detection As Multi-Topic Tracking." Information Retrieval 5.2 (April 2002): 139. Academic OneFile. Gale. Brigham Young University - Utah. 4 Apr. 2008 ** . ** Gale Document Number:A155176171