PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model

This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals. Recently, DDPM-based neural vocoders have gained prominence as non-autoregressive models that can generate high-quality waveforms....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 12782 - 12786
Hlavní autori: Hono, Yukiya, Hashimoto, Kei, Nankaku, Yoshihiko, Tokuda, Keiichi
Médium: Konferenčný príspevok..
Jazyk:English
Japanese
Vydavateľské údaje: IEEE 14.04.2024
Predmet:
ISSN:2379-190X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals. Recently, DDPM-based neural vocoders have gained prominence as non-autoregressive models that can generate high-quality waveforms. The neural vocoders based on DDPM have the advantage of training with a simple time-domain loss. In practical applications, such as singing voice synthesis, there is a demand for neural vocoders to generate high-fidelity speech waveforms with flexible pitch control. However, conventional DDPM-based neural vocoders struggle to generate speech waveforms under such conditions. Our proposed model aims to accurately capture the periodic structure of speech waveforms by incorporating explicit periodic signals. Experimental results show that our model improves sound quality and provides better pitch control than conventional DDPM-based neural vocoders.
AbstractList This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals. Recently, DDPM-based neural vocoders have gained prominence as non-autoregressive models that can generate high-quality waveforms. The neural vocoders based on DDPM have the advantage of training with a simple time-domain loss. In practical applications, such as singing voice synthesis, there is a demand for neural vocoders to generate high-fidelity speech waveforms with flexible pitch control. However, conventional DDPM-based neural vocoders struggle to generate speech waveforms under such conditions. Our proposed model aims to accurately capture the periodic structure of speech waveforms by incorporating explicit periodic signals. Experimental results show that our model improves sound quality and provides better pitch control than conventional DDPM-based neural vocoders.
Author Tokuda, Keiichi
Hashimoto, Kei
Nankaku, Yoshihiko
Hono, Yukiya
Author_xml – sequence: 1
  givenname: Yukiya
  surname: Hono
  fullname: Hono, Yukiya
  organization: Nagoya Institute of Technology,Nagoya,Japan
– sequence: 2
  givenname: Kei
  surname: Hashimoto
  fullname: Hashimoto, Kei
  organization: Nagoya Institute of Technology,Nagoya,Japan
– sequence: 3
  givenname: Yoshihiko
  surname: Nankaku
  fullname: Nankaku, Yoshihiko
  organization: Nagoya Institute of Technology,Nagoya,Japan
– sequence: 4
  givenname: Keiichi
  surname: Tokuda
  fullname: Tokuda, Keiichi
  organization: Nagoya Institute of Technology,Nagoya,Japan
BookMark eNo1kMFKAzEYhKMo2Na-gYf4AFv_JNsm8aatVqHqQqt4smQ3fzASN5JsEd_eFfU0H8MwDDMkB21skZBTBhPGQJ_dzi_W66pUpZpOOPBywqDsGfgeGWuplZiC-DHYPhlwIXXBNDwfkWHObwCgZKkG5KXC5KNdJmPP6SZ-mmQzrXzXvBbz2HYphmDqgPQed8kE-hSbaDHRS5PR0thSQxfeuV32PVcp1qb2wefON_SuD4ZjcuhMyDj-0xF5vL7azG-K1cOyn78qPJeiK5ySM8nRGW1nDUMUlllZOmS1Kw0TogHbgNDMSHS1xJpZAD6b6hqU1VIqMSInv70eEbcfyb-b9LX9_0N8A1fEWJA
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP48485.2024.10448502
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9798350344851
EISSN 2379-190X
EndPage 12786
ExternalDocumentID 10448502
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i273t-f87672efa9d6c1ee3d1d74fe1bf4a133c0dc0391a7efb7eb1d002659b08d97783
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001396233806006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:36:27 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
Japanese
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i273t-f87672efa9d6c1ee3d1d74fe1bf4a133c0dc0391a7efb7eb1d002659b08d97783
OpenAccessLink https://cir.nii.ac.jp/crid/1872836541413022336
PageCount 5
ParticipantIDs ieee_primary_10448502
PublicationCentury 2000
PublicationDate 2024-04-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.3052008
Snippet This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary...
SourceID ieee
SourceType Publisher
StartPage 12782
SubjectTerms Controllability
diffusion probabilistic model
neural vocoder
pitch controllability
Probabilistic logic
Robustness
Signal processing
singing voice synthesis
Speech processing
Speech synthesis
Training
Vocoders
Title PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
URI https://ieeexplore.ieee.org/document/10448502
WOSCitedRecordID wos001396233806006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwEA86RPTFr4nfRPC1c23TJvVNp1NBRmFT9uRIexcojFa6zb_fXPahPvjgWwiEgztyl7vc73eMXQk0cW4y8OJQgicURF7SDtELYogUBsLEmQMKv8heTw2HSboAqzssDCK65jNs0dL95UOVz6hUZm-4TSYioo5cl1LOwVort6ukUJvsckGief3cue33U6HsAZsFBqK1PPxrjIqLIt2df8rfZc1vPB5PV5Fmj61huc-2f1AJHrD31C4qeKw13PCB64Wd8LSwNvE68270MYGkOJFx6DF_qwjLXvM7G8SAVyXX_L4wZka1MxKVOeJd4nDmNC1t3GSv3YdB58lbzE7wCvsgmXrGejkZoNEJxLmPGIIPUhj0MyO0zUvzNuREDq8lmkxahw2UjUVJ1lZgn4QqPGSNsirxiHE_kfbqB4EOrDapAGmgrTGRJtM-qDg6Zk1S1ehjTo8xWmrp5I_9U7ZFBqEvGV-csca0nuE528g_p8WkvnBG_QIWm6Ns
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA6i4uPiq-LbCF5Td7PZTdabVmuLtSy0Sk-W7GYCC2VXtq2_3yR9qAcP3kIgJMyQmcxkvm8Qumago0ynikQBV4QJFZLYC4DQSIUCKNNR6oDCHd7tisEgTuZgdYeFAQBXfAZ1O3R_-arMpjZVZm64CSZCSx25FjJG_Rlca2l4BWdiA13NaTRv2o27Xi9hwiwxcSBl9cXyX41UnB9p7vzzBLuo9o3Iw8nS1-yhFSj20fYPMsED9J6YQameKqlucd9Vw45xkhutkMasHn1kYVLY0nHIEX4rLZq9wvfGjSlcFljih1zrqc2e2a1SR71rWZyx7Zc2qqHX5mO_0SLz7gkkN0-SCdHGznEKWsYqynyAQPmKMw1-qpk0kWnmqczSw0sOOuXGZCsbj4Vx6gllHoUiOESrRVnAEcJ-zM3lp1RSI02bgtTKkxBznUpfiSg8RjUrquHHjCBjuJDSyR_zl2iz1X_pDDvt7vMp2rLKsR80PjtDq5NqCudoPfuc5OPqwin4C8XOprM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=PeriodGrad%3A+Towards+Pitch-Controllable+Neural+Vocoder+Based+on+a+Diffusion+Probabilistic+Model&rft.au=Hono%2C+Yukiya&rft.au=Hashimoto%2C+Kei&rft.au=Nankaku%2C+Yoshihiko&rft.au=Tokuda%2C+Keiichi&rft.date=2024-04-14&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=12782&rft.epage=12786&rft_id=info:doi/10.1109%2FICASSP48485.2024.10448502&rft.externalDocID=10448502