Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models

An attention-based automatic speech recognition (ASR) model generates a probability distribution of the tokens set at each time step. Recent studies have shown that calibration errors exist in the output probability distributions of attention-based ASR models trained to minimize the negative log lik...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE/ACM transactions on audio, speech, and language processing Ročník 32; s. 4919 - 4930
Hlavní autoři:	Lee, Mun-Hak, Chang, Joon-Hyuk
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	IEEE 2024
Témata:	Accuracy Analytical models attention-base encoder decoder Calibration Data models Decoding Error analysis Measurement uncertainty post-hoc calibration methods Probability distribution sequence-level training Speech processing Speech recognition Training
ISSN:	2329-9290, 2329-9304
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	An attention-based automatic speech recognition (ASR) model generates a probability distribution of the tokens set at each time step. Recent studies have shown that calibration errors exist in the output probability distributions of attention-based ASR models trained to minimize the negative log likelihood. This study analyzes the causes of calibration errors in ASR model outputs and their impact on model performance. Based on the analysis, we argue that conventional methods for estimating calibration errors at the token level are unsuitable for ASR tasks. Accordingly, we propose a new calibration measure that estimates the calibration error at the sequence level. Moreover, we present a new post-hoc calibration function and training objective to mitigate the calibration error of the ASR model at the sequence level. Through experiments using the ASR benchmark, we show that the proposed methods effectively alleviate the calibration error of the ASR model and improve the generalization performance.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2024.3492799