Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses

This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is compo...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 1 - 5
Hlavní autori:	Ai, Yang, Ling, Zhen-Hua
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 04.06.2023
Predmet:	anti-wrapping loss Delays Estimation Iterative algorithms neural network parallel estimation architecture phase wrapping Prediction algorithms Predictive models Signal processing algorithms speech phase prediction Training
ISSN:	2379-190X
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed.
AbstractList	This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed.
Author	Ai, Yang Ling, Zhen-Hua
Author_xml	– sequence: 1 givenname: Yang surname: Ai fullname: Ai, Yang email: yangai@ustc.edu.cn organization: University of Science and Technology of China,National Engineering Research Center of Speech and Language Information Processing,Hefei,P.R.China – sequence: 2 givenname: Zhen-Hua surname: Ling fullname: Ling, Zhen-Hua email: zhling@ustc.edu.cn organization: University of Science and Technology of China,National Engineering Research Center of Speech and Language Information Processing,Hefei,P.R.China
BookMark	eNo1UM1OwzAYCwgktsEbcAgP0PIlaZrkOKbxI01QqSC4TWnylQWVrkq6A29Pxc_JtmxZlufkpN_3SMgVg5wxMNcPq2VdV4URUuUcuMgZgCmlFEdkzhTXrBRcqWMy40KZjBl4OyPzlD4AQKtCz4h9xEO0Ha0HRLej1c4mpFVEH9wY9j29mbSnE6nsFOuwo-s0hk_7Yy6j24UR3XiISG3v6bIfQ_Ya7TCE_p1u9ilhOienre0SXvzhgrzcrp9X99nm6W6av8kC02bMeNtygRKVwda3nlmwttGNRyfBC8HRGmfKEjW2DbCG28LwQkknDUhdlFwsyOVvb0DE7RCnkfFr-_-H-AbrOllj
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP49357.2023.10096553
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	1728163277 9781728163277
EISSN	2379-190X
EndPage	5
ExternalDocumentID	10096553
Genre	orig-research
GrantInformation_xml	– fundername: Fundamental Research Funds for the Central Universities funderid: 10.13039/501100012226 – fundername: Nature funderid: 10.13039/501100020487
GroupedDBID	23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i189t-2ff23e5e79efdfd1a0aab8bdec50d332ea9c966e8efb01b2a492475c590584623
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:35:11 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i189t-2ff23e5e79efdfd1a0aab8bdec50d332ea9c966e8efb01b2a492475c590584623
PageCount	5
ParticipantIDs	ieee_primary_10096553
PublicationCentury	2000
PublicationDate	2023-June-4
PublicationDateYYYYMMDD	2023-06-04
PublicationDate_xml	– month: 06 year: 2023 text: 2023-June-4 day: 04
PublicationDecade	2020
PublicationTitle	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev	ICASSP
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.4468277
Snippet	This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	anti-wrapping loss Delays Estimation Iterative algorithms neural network parallel estimation architecture phase wrapping Prediction algorithms Predictive models Signal processing algorithms speech phase prediction Training
Title	Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses
URI	https://ieeexplore.ieee.org/document/10096553
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62iOjFV8U3Ebym7iNpkmMtLQpSFqrYW5lNZm2hbMu69febbB_WgwdvyUIImSGZzE6-7yPk3qrYRJZbJpQExl1QYcBbIXOpgW0ZCNMoyCqxCdnvq-FQJyuweoWFQcTq8Rk2fbOq5duZWfhfZW6He64SEddITcrWEqy1OXaV5GqP3K1INB-eO-3BIOE6FrLpJcKb68G_ZFSqKNI7_Of8R6Txg8ejySbSHJMdzE_IwRaV4CkBz7IBUzqYI5oxTcYuOrkxvgzjTU8fXd9S10ig8PIpU9p1m3uJW6TtrWoChdzSdl5O2HsBnrzhg77MfGW4Qd563dfOE1vJJ7BJqHTJoiyLYhQoNWY2syEEAKlKLRoR2DiOELRxyQ4qzNLAOQW4y8WkMEIH_lYSxWekns9yPCdUWlBgNGTu_sARY2WNcWvWqLRQobQXpOGtNZovGTJGa0Nd_vH9iux7n1RPrvg1qZfFAm_IrvkqJ5_FbeXXb7ehpbs
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UjY-LL4xva-K1uI-WtkckEIhINgEjN9JtZ4WELGRd_P22y0M8ePDWbtKknUk7nZ1-34fQoxGhDgw1hAmuCLVBhSha9YlNDUxVKz8OvKQQm-DdrhgMZLQEqxdYGAAoHp9BxTWLWr6Z6rn7VWZ3uOMqYeE22mGUBt4CrrU-eAWnYg89LGk0n9r1Wq8XURkyXnEi4ZXV8F9CKkUcaR79cwbHqPyDyMPROtacoC1IT9HhBpngGVKOZ0NNcG8GoEc4Gtn4ZMe4QowzPn62fYNtI1KZE1CZ4Ibd3gvkIq5t1BOwSg2upfmYvGfK0Td84M7U1YbL6K3Z6NdbZCmgQMa-kDkJkiQIgQGXkJjE-MpTKhaxAc08E4YBKKltugMCktizblHUZmOcaSY9dy8JwnNUSqcpXCDMjRJKS5XYGwQFCIXR2q5ZgpBM-NxcorKz1nC24MgYrgx19cf3e7Tf6r92hp129-UaHTj_FA-w6A0q5dkcbtGu_srHn9ld4eNvglCpAg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Neural+Speech+Phase+Prediction+Based+on+Parallel+Estimation+Architecture+and+Anti-Wrapping+Losses&rft.au=Ai%2C+Yang&rft.au=Ling%2C+Zhen-Hua&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096553&rft.externalDocID=10096553