Search Results - Electrical Engineering and Systems Science - Audio and Speech Processing

Refine Results
  1. 1
  2. 2

    PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model by Hono, Yukiya, Hashimoto, Kei, Nankaku, Yoshihiko, Tokuda, Keiichi

    ISSN: 2379-190X
    Published: IEEE 14.04.2024
    “… In practical applications, such as singing voice synthesis, there is a demand for neural vocoders to generate high-fidelity speech waveforms with flexible pitch control…”
    Get full text
    Conference Proceeding
  3. 3

    MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction by Zhou, Wangjin, Yang, Zhengdong, Chu, Chenhui, Li, Sheng, Dabre, Raj, Zhao, Yi, Tatsuya, Kawahara

    ISSN: 2379-190X
    Published: IEEE 14.04.2024
    “… This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD) as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice…”
    Get full text
    Conference Proceeding
  4. 4

    Bass Accompaniment Generation Via Latent Diffusion by Pasini, Marco, Grachten, Maarten, Lattner, Stefan

    ISSN: 2379-190X
    Published: IEEE 14.04.2024
    “… We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length…”
    Get full text
    Conference Proceeding
  5. 5

    Data Driven Grapheme-to-Phoneme Representations for a Lexicon-Free Text-to-Speech by Garg, Abhinav, Kim, Jiyeon, Khyalia, Sushil, Kim, Chanwoo, Gowda, Dhananjaya

    ISSN: 2379-190X
    Published: IEEE 14.04.2024
    “…Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system…”
    Get full text
    Conference Proceeding
  6. 6

    Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression by Tomita, Yoshihide, Koyama, Shoichi, Saruwatari, Hiroshi

    ISSN: 2379-190X
    Published: IEEE 14.04.2024
    “… The exterior radiation from the loudspeakers in sound field synthesis systems can be problematic in practical situations…”
    Get full text
    Conference Proceeding
  7. 7
  8. 8

    Versatile Time-Frequency Representations Realized by Convex Penalty on Magnitude Spectrogram by Keidai Arai, Koki Yamada, Kohei Yatabe

    ISSN: 1070-9908, 1558-2361
    Published: Institute of Electrical and Electronics Engineers (IEEE) 01.01.2023
    Published in IEEE Signal Processing Letters (01.01.2023)
    Get full text
    Journal Article
  9. 9
  10. 10

    Generalized Domain Adaptation Framework for Parametric Back-End in Speaker Recognition by Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

    ISSN: 1556-6013, 1556-6021
    Published: Institute of Electrical and Electronics Engineers (IEEE) 01.01.2023
    Get full text
    Journal Article
  11. 11

    Investigation of Japanese PnG BERT Language Model in Text-to-Speech Synthesis for Pitch Accent Language by Yusuke Yasuda, Tomoki Toda

    ISSN: 1932-4553, 1941-0484
    Published: Institute of Electrical and Electronics Engineers (IEEE) 01.10.2022
    Get full text
    Journal Article
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17

    RISC: A Corpus for Shout Type Classification and Shout Intensity Prediction by Takahiro Fukumori, Taito Ishida, Yoichi Yamashita

    ISSN: 2329-9290, 2329-9304
    Published: Institute of Electrical and Electronics Engineers (IEEE) 01.01.2024
    Get full text
    Journal Article
  18. 18
  19. 19
  20. 20