Automatic voice onset time detection for unvoiced stops (/ p/,/ t/,/ k/) with application to accent classification

Articulation characteristics of particular phonemes can provide cues to distinguish accents in spoken English. For example, as shown in Arslan and Hansen (1996, 1997), Voice Onset Time (VOT) can be used to classify mandarin, Turkish, German and American accented English. Our goal in this study is to...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Speech communication Ročník 52; číslo 10; s. 777 - 789
Hlavní autoři: Hansen, John H.L., Gray, Sharmistha S., Kim, Wooil
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier B.V 01.10.2010
Elsevier
Témata:
ISSN:0167-6393, 1872-7182
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Articulation characteristics of particular phonemes can provide cues to distinguish accents in spoken English. For example, as shown in Arslan and Hansen (1996, 1997), Voice Onset Time (VOT) can be used to classify mandarin, Turkish, German and American accented English. Our goal in this study is to develop an automatic system that classifies accents using VOT in unvoiced stops 1 A preliminary version of some of the work in this study was presented at the IEEE NORSIG-04 Symposium in Das (Gray) and Hansen (2004). 1 . VOT is an important temporal feature which is often overlooked in speech perception, speech recognition, as well as accent detection. Fixed length frame-based speech processing inherently ignores VOT. In this paper, a more effective VOT detection scheme using the non-linear energy tracking algorithm Teager Energy Operator (TEO), across a sub-frequency band partition for unvoiced stops (/ p/, / t/ and / k/), is introduced. The proposed VOT detection algorithm also incorporates spectral differences in the Voice Onset Region (VOR) and the succeeding vowel of a given stop-vowel sequence to classify speakers having accents due to different ethnic origin. The spectral cues are enhanced using one of the four types of feature parameter extractions – Discrete Mellin Transform (DMT), Discrete Mellin Fourier Transform (DMFT) and Discrete Wavelet Transform using the lowest and the highest frequency resolutions (DWTlfr and DWThfr). A Hidden Markov Model (HMM) classifier is employed with these extracted parameters and applied to the problem of accent classification. Three different language groups (American English, Chinese, and Indian) are used from the CU-Accent database. The VOT is detected with less than 10% error when compared to the manual detected VOT with a success rate of 79.90%, 87.32% and 47.73% for English, Chinese and Indian speakers (includes atypical cases for Indian case), respectively. It is noted that the DMT and DWTlfr features are good for parameterizing speech samples which exhibit substitution of succeeding vowel after the stop in accented speech. The successful accent classification rates of DMT and DWTlfr features are 66.13% and 71.67%, for / p/ and / t/ respectively, for pairwise accent detection. Alternatively, the DMFT feature works on all accent sensitive words considered, with a success rate of 70.63%. This study shows that effective VOT detection can be achieved using an integrated TEO processing with spectral difference analysis in the VOR that can be employed for accent classification.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2010.05.004