Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses

This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is compo...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 1 - 5
Hlavní autori: Ai, Yang, Ling, Zhen-Hua
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 04.06.2023
Predmet:
ISSN:2379-190X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed.
AbstractList This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed.
Author Ai, Yang
Ling, Zhen-Hua
Author_xml – sequence: 1
  givenname: Yang
  surname: Ai
  fullname: Ai, Yang
  email: yangai@ustc.edu.cn
  organization: University of Science and Technology of China,National Engineering Research Center of Speech and Language Information Processing,Hefei,P.R.China
– sequence: 2
  givenname: Zhen-Hua
  surname: Ling
  fullname: Ling, Zhen-Hua
  email: zhling@ustc.edu.cn
  organization: University of Science and Technology of China,National Engineering Research Center of Speech and Language Information Processing,Hefei,P.R.China
BookMark eNo1UM1OwzAYCwgktsEbcAgP0PIlaZrkOKbxI01QqSC4TWnylQWVrkq6A29Pxc_JtmxZlufkpN_3SMgVg5wxMNcPq2VdV4URUuUcuMgZgCmlFEdkzhTXrBRcqWMy40KZjBl4OyPzlD4AQKtCz4h9xEO0Ha0HRLej1c4mpFVEH9wY9j29mbSnE6nsFOuwo-s0hk_7Yy6j24UR3XiISG3v6bIfQ_Ya7TCE_p1u9ilhOienre0SXvzhgrzcrp9X99nm6W6av8kC02bMeNtygRKVwda3nlmwttGNRyfBC8HRGmfKEjW2DbCG28LwQkknDUhdlFwsyOVvb0DE7RCnkfFr-_-H-AbrOllj
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP49357.2023.10096553
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1728163277
9781728163277
EISSN 2379-190X
EndPage 5
ExternalDocumentID 10096553
Genre orig-research
GrantInformation_xml – fundername: Fundamental Research Funds for the Central Universities
  funderid: 10.13039/501100012226
– fundername: Nature
  funderid: 10.13039/501100020487
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i189t-2ff23e5e79efdfd1a0aab8bdec50d332ea9c966e8efb01b2a492475c590584623
IEDL.DBID RIE
IngestDate Wed Aug 27 02:35:11 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i189t-2ff23e5e79efdfd1a0aab8bdec50d332ea9c966e8efb01b2a492475c590584623
PageCount 5
ParticipantIDs ieee_primary_10096553
PublicationCentury 2000
PublicationDate 2023-June-4
PublicationDateYYYYMMDD 2023-06-04
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June-4
  day: 04
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.4468277
Snippet This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms anti-wrapping loss
Delays
Estimation
Iterative algorithms
neural network
parallel estimation architecture
phase wrapping
Prediction algorithms
Predictive models
Signal processing algorithms
speech phase prediction
Training
Title Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses
URI https://ieeexplore.ieee.org/document/10096553
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62iOjFV8U3Ebym7iNpkmMtLQpSFqrYW5lNZm2hbMu69febbB_WgwdvyUIImSGZzE6-7yPk3qrYRJZbJpQExl1QYcBbIXOpgW0ZCNMoyCqxCdnvq-FQJyuweoWFQcTq8Rk2fbOq5duZWfhfZW6He64SEddITcrWEqy1OXaV5GqP3K1INB-eO-3BIOE6FrLpJcKb68G_ZFSqKNI7_Of8R6Txg8ejySbSHJMdzE_IwRaV4CkBz7IBUzqYI5oxTcYuOrkxvgzjTU8fXd9S10ig8PIpU9p1m3uJW6TtrWoChdzSdl5O2HsBnrzhg77MfGW4Qd563dfOE1vJJ7BJqHTJoiyLYhQoNWY2syEEAKlKLRoR2DiOELRxyQ4qzNLAOQW4y8WkMEIH_lYSxWekns9yPCdUWlBgNGTu_sARY2WNcWvWqLRQobQXpOGtNZovGTJGa0Nd_vH9iux7n1RPrvg1qZfFAm_IrvkqJ5_FbeXXb7ehpbs
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UjY-LL4xva-K1uI-WtkckEIhINgEjN9JtZ4WELGRd_P22y0M8ePDWbtKknUk7nZ1-34fQoxGhDgw1hAmuCLVBhSha9YlNDUxVKz8OvKQQm-DdrhgMZLQEqxdYGAAoHp9BxTWLWr6Z6rn7VWZ3uOMqYeE22mGUBt4CrrU-eAWnYg89LGk0n9r1Wq8XURkyXnEi4ZXV8F9CKkUcaR79cwbHqPyDyMPROtacoC1IT9HhBpngGVKOZ0NNcG8GoEc4Gtn4ZMe4QowzPn62fYNtI1KZE1CZ4Ibd3gvkIq5t1BOwSg2upfmYvGfK0Td84M7U1YbL6K3Z6NdbZCmgQMa-kDkJkiQIgQGXkJjE-MpTKhaxAc08E4YBKKltugMCktizblHUZmOcaSY9dy8JwnNUSqcpXCDMjRJKS5XYGwQFCIXR2q5ZgpBM-NxcorKz1nC24MgYrgx19cf3e7Tf6r92hp129-UaHTj_FA-w6A0q5dkcbtGu_srHn9ld4eNvglCpAg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Neural+Speech+Phase+Prediction+Based+on+Parallel+Estimation+Architecture+and+Anti-Wrapping+Losses&rft.au=Ai%2C+Yang&rft.au=Ling%2C+Zhen-Hua&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096553&rft.externalDocID=10096553