Learning with noisy supervision for Spoken Language Understanding

Data-driven spoken language understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been successfully applied to automatic speech recognition and utterance classification. In general, corpora annotation for SLU i...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2008 IEEE International Conference on Acoustics, Speech and Signal Processing s. 4989 - 4992
Hlavní autoři:	Raymond, C., Riccardi, G.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.03.2008
Témata:	Active Learning Conditional Random Fields Costs Data mining Humans Labeling Machine learning algorithms Natural languages Noise level Noise robustness Spoken Language Understanding Uncertainty
ISBN:	9781424414833, 1424414830
ISSN:	1520-6149
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Data-driven spoken language understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been successfully applied to automatic speech recognition and utterance classification. In general, corpora annotation for SLU involves such tasks as sentence segmentation, chunking or frame labeling and predicate-argument annotation. In such cases human annotations are subject to errors increasing with the annotation complexity. We investigate two alternative noise-robust active learning strategies that are either data-intensive or supervision-intensive. The strategies detect likely erroneous examples and improve significantly the SLU performance for a given labeling cost. We apply uncertainty based active learning with conditional random fields on the concept segmentation task for SLU. We perform annotation experiments on two databases, namely ATIS (English) and Media (French). We show that our noise-robust algorithm could improve the accuracy up to 6% (absolute) depending on the noise level and the labeling cost.
ISBN:	9781424414833 1424414830
ISSN:	1520-6149
DOI:	10.1109/ICASSP.2008.4518778