Feature learning of Japanese pitch accents and applications to Japanese speech education
We modeled pitch frequency by a sequential variational autoencoder to obtain the feature representations of the pitch accents of Japanese words for applications to Japanese speech education for the hearing impaired and Japanese-language learners. In our model, the latent variables are comprised of t...
Uložené v:
| Vydané v: | 2023 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) s. 188 - 193 |
|---|---|
| Hlavný autor: | |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English Japanese |
| Vydavateľské údaje: |
IEEE
08.07.2023
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | We modeled pitch frequency by a sequential variational autoencoder to obtain the feature representations of the pitch accents of Japanese words for applications to Japanese speech education for the hearing impaired and Japanese-language learners. In our model, the latent variables are comprised of two types. One represents time-invariant features of pitch accent types and the other represents time-variant features of voiced/unvoiced segments. We approximated the distribution of the time-invariant latent variables by a Gaussian mixture model and estimated the accent type of the test data to confirm that they represented the features of the accent types. Next by varying only the value of the time-invariant latent variables, we resynthesized 49 different pitch patterns per word and generated speech that transformed the pitch frequency of the original speech into such pitch patterns. Seven subjects rated the adequacy of the pitch patterns for words. We found that the distribution of the subjects' rating averages tended to extend to accent types other than the annotated accent types compared to the distribution of accent features represented in time-invariant latent space. |
|---|---|
| DOI: | 10.1109/IIAI-AAI59060.2023.00047 |