Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses
This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is compo...
Uložené v:
| Vydané v: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 1 - 5 |
|---|---|
| Hlavní autori: | , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
04.06.2023
|
| Predmet: | |
| ISSN: | 2379-190X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed. |
|---|---|
| AbstractList | This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental results show that our proposed neural speech phase prediction model outperforms the iterative Griffin-Lim algorithm and other neural network-based method, in terms of both reconstructed speech quality and generation speed. |
| Author | Ai, Yang Ling, Zhen-Hua |
| Author_xml | – sequence: 1 givenname: Yang surname: Ai fullname: Ai, Yang email: yangai@ustc.edu.cn organization: University of Science and Technology of China,National Engineering Research Center of Speech and Language Information Processing,Hefei,P.R.China – sequence: 2 givenname: Zhen-Hua surname: Ling fullname: Ling, Zhen-Hua email: zhling@ustc.edu.cn organization: University of Science and Technology of China,National Engineering Research Center of Speech and Language Information Processing,Hefei,P.R.China |
| BookMark | eNo1UM1OwzAYCwgktsEbcAgP0PIlaZrkOKbxI01QqSC4TWnylQWVrkq6A29Pxc_JtmxZlufkpN_3SMgVg5wxMNcPq2VdV4URUuUcuMgZgCmlFEdkzhTXrBRcqWMy40KZjBl4OyPzlD4AQKtCz4h9xEO0Ha0HRLej1c4mpFVEH9wY9j29mbSnE6nsFOuwo-s0hk_7Yy6j24UR3XiISG3v6bIfQ_Ya7TCE_p1u9ilhOienre0SXvzhgrzcrp9X99nm6W6av8kC02bMeNtygRKVwda3nlmwttGNRyfBC8HRGmfKEjW2DbCG28LwQkknDUhdlFwsyOVvb0DE7RCnkfFr-_-H-AbrOllj |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP49357.2023.10096553 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 1728163277 9781728163277 |
| EISSN | 2379-190X |
| EndPage | 5 |
| ExternalDocumentID | 10096553 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Fundamental Research Funds for the Central Universities funderid: 10.13039/501100012226 – fundername: Nature funderid: 10.13039/501100020487 |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i189t-2ff23e5e79efdfd1a0aab8bdec50d332ea9c966e8efb01b2a492475c590584623 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:35:11 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i189t-2ff23e5e79efdfd1a0aab8bdec50d332ea9c966e8efb01b2a492475c590584623 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10096553 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-June-4 |
| PublicationDateYYYYMMDD | 2023-06-04 |
| PublicationDate_xml | – month: 06 year: 2023 text: 2023-June-4 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.4468277 |
| Snippet | This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | anti-wrapping loss Delays Estimation Iterative algorithms neural network parallel estimation architecture phase wrapping Prediction algorithms Predictive models Signal processing algorithms speech phase prediction Training |
| Title | Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses |
| URI | https://ieeexplore.ieee.org/document/10096553 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62iOjFV8U3Ebym7iNpkmMtLQpSFqrYW5lNZm2hbMu69febbB_WgwdvyUIImSGZzE6-7yPk3qrYRJZbJpQExl1QYcBbIXOpgW0ZCNMoyCqxCdnvq-FQJyuweoWFQcTq8Rk2fbOq5duZWfhfZW6He64SEddITcrWEqy1OXaV5GqP3K1INB-eO-3BIOE6FrLpJcKb68G_ZFSqKNI7_Of8R6Txg8ejySbSHJMdzE_IwRaV4CkBz7IBUzqYI5oxTcYuOrkxvgzjTU8fXd9S10ig8PIpU9p1m3uJW6TtrWoChdzSdl5O2HsBnrzhg77MfGW4Qd563dfOE1vJJ7BJqHTJoiyLYhQoNWY2syEEAKlKLRoR2DiOELRxyQ4qzNLAOQW4y8WkMEIH_lYSxWekns9yPCdUWlBgNGTu_sARY2WNcWvWqLRQobQXpOGtNZovGTJGa0Nd_vH9iux7n1RPrvg1qZfFAm_IrvkqJ5_FbeXXb7ehpbs |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UjY-LL4xva-K1uI-WtkckEIhINgEjN9JtZ4WELGRd_P22y0M8ePDWbtKknUk7nZ1-34fQoxGhDgw1hAmuCLVBhSha9YlNDUxVKz8OvKQQm-DdrhgMZLQEqxdYGAAoHp9BxTWLWr6Z6rn7VWZ3uOMqYeE22mGUBt4CrrU-eAWnYg89LGk0n9r1Wq8XURkyXnEi4ZXV8F9CKkUcaR79cwbHqPyDyMPROtacoC1IT9HhBpngGVKOZ0NNcG8GoEc4Gtn4ZMe4QowzPn62fYNtI1KZE1CZ4Ibd3gvkIq5t1BOwSg2upfmYvGfK0Td84M7U1YbL6K3Z6NdbZCmgQMa-kDkJkiQIgQGXkJjE-MpTKhaxAc08E4YBKKltugMCktizblHUZmOcaSY9dy8JwnNUSqcpXCDMjRJKS5XYGwQFCIXR2q5ZgpBM-NxcorKz1nC24MgYrgx19cf3e7Tf6r92hp129-UaHTj_FA-w6A0q5dkcbtGu_srHn9ld4eNvglCpAg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Neural+Speech+Phase+Prediction+Based+on+Parallel+Estimation+Architecture+and+Anti-Wrapping+Losses&rft.au=Ai%2C+Yang&rft.au=Ling%2C+Zhen-Hua&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096553&rft.externalDocID=10096553 |