Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder
Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequen...
Saved in:
| Published in: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
04.06.2023
|
| Subjects: | |
| ISSN: | 2379-190X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information, we supplement multiple spectrograms in different frame lengths into the time-domain encoders. They extract stationary frequency information in both narrowband and wideband. We also adopt multiple decoder outputs, each of which computes its corresponding resolution frequency loss. Experimental results show that (1) it is more effective to fuse stationary frequency features than non-stationary features in the encoder, and (2) the multiple outputs consistent with the frequency loss improve performance. Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement. |
|---|---|
| AbstractList | Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information, we supplement multiple spectrograms in different frame lengths into the time-domain encoders. They extract stationary frequency information in both narrowband and wideband. We also adopt multiple decoder outputs, each of which computes its corresponding resolution frequency loss. Experimental results show that (1) it is more effective to fuse stationary frequency features than non-stationary features in the encoder, and (2) the multiple outputs consistent with the frequency loss improve performance. Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement. |
| Author | Dang, Jianwu Shi, Hao Kawahara, Tatsuya Wang, Longbiao Mimura, Masato |
| Author_xml | – sequence: 1 givenname: Hao surname: Shi fullname: Shi, Hao organization: Kyoto University,Graduate School of Informatics,Kyoto,Japan – sequence: 2 givenname: Masato surname: Mimura fullname: Mimura, Masato organization: Kyoto University,Graduate School of Informatics,Kyoto,Japan – sequence: 3 givenname: Longbiao surname: Wang fullname: Wang, Longbiao organization: Tianjin University,College of Intelligence and Computing,Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin,China – sequence: 4 givenname: Jianwu surname: Dang fullname: Dang, Jianwu organization: Tianjin University,College of Intelligence and Computing,Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin,China – sequence: 5 givenname: Tatsuya surname: Kawahara fullname: Kawahara, Tatsuya organization: Kyoto University,Graduate School of Informatics,Kyoto,Japan |
| BookMark | eNo1kMtOwzAURA0CibbwByzMByT4GdvLqg9AKgKRIsGqcuIb1ahxSpws8veNeKxmNnN0NFN0EZoACN1RklJKzP3TYp7nr8JwqVJGGE8pIUYoqs_QlCqmacaZUudowrgyCTXk4wpNY_wihGgl9AR9bn0NybKprQ84PwKUe7wKextKqCF0eB6jjx04XAz4uT90PnmD2Bz6zjcBr1v47iGUwzgpGwcttsHhJfz0a3RZ2UOEm7-coff1art4TDYvD6P2JvGUmi6xlVNSKVYVo53TTFgopKkYKTMzqnNnOWPaSuGkKGzJMzDEGpBSEGDSaj5Dt79cDwC7Y-tr2w67_x_4CXqyVYM |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP49357.2023.10094718 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 1728163277 9781728163277 |
| EISSN | 2379-190X |
| EndPage | 5 |
| ExternalDocumentID | 10094718 |
| Genre | orig-research |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i119t-afd75772fb190d824aeb59f20c692773da3228a54d54bac36e90a9e5540e25a83 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:35:11 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i119t-afd75772fb190d824aeb59f20c692773da3228a54d54bac36e90a9e5540e25a83 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10094718 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-June-4 |
| PublicationDateYYYYMMDD | 2023-06-04 |
| PublicationDate_xml | – month: 06 year: 2023 text: 2023-June-4 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.377918 |
| Snippet | Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Decoding Fuses multiresolution spectrograms Neural networks Spectrogram Speech enhancement time domain Time-domain analysis Time-frequency analysis |
| Title | Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder |
| URI | https://ieeexplore.ieee.org/document/10094718 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62iOjFV8U3EbymbjfJJjlKHyhIKVShnko2mdAe3Ja1FfrvTbK16sGDtxDILuTLzOQx3zcI3ULunPNLgVipDWGKGaJY7s3dpNRyoFxDFHF9Ev2-HI3UYE1Wj1wYAIjJZ9AMzfiWb2dmGa7KvIX7w4h3pjVUEyKryFobtysFkzvoZi2ieffYvh8OB0xRLpqhRHjza_CvMioxivT2__n_A9T45uPhwSbSHKItKI7Q3g8pwWP0GrgcpDN78yd9PJwDmAnuFpOAafgo9jAEQC3OVziSbkm4uK-WHe6VVUL1yg8JHPcS68LiDsR2A730us_tB7Ium0CmrZZaEO2s4H7T7HIf7K1MmYacK5cmJlOpENRqb8RSc2Y5y7WhGahEK_D7igRSriU9QfViVsApwi2qnXDcRJF7nWe5lQkPgszeh9pM0TPUCLM0nlfKGOOvCTr_o_8C7QYsYqoVu0T1RbmEK7RtPhbT9_I64vkJxnuh1g |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UjY-LL4xva-K1uGzb3e3RCAQiEhIwwRPpttPAwYWsYMK_t-0C6sGDt6ZJN5t-nZk-5vsGoXtIjTF2KRCdSEWYYIoIllpzVyHVHCiX4EVc23GnkwwGorskq3suDAD45DOouKZ_y9cTNXdXZdbC7WHEOtNNtMUZC4OCrrV2vEnMkh10t5TRfGg9PfZ6XSYojyuuSHhlNfxXIRUfRxoH__yDQ1T-ZuTh7jrWHKENyI7R_g8xwRP05tgcpDZ5t2d93JsCqBGuZyOHqvsotkA4SDVOF9jTbom7ui8WHm7kRUr1wg5xLPccy0zjGvh2Gb026v2nJlkWTiDjalXMiDQ65nbbbFIb7nUSMgkpFyYMVCTCOKZaWjNOJGeas1QqGoEIpAC7swgg5DKhp6iUTTI4Q7hKpYkNV17mXqZRqpOAO0lm60V1JOg5KrtZGk4LbYzhaoIu_ui_RbvN_kt72G51ni_RnsPFJ16xK1Sa5XO4Rtvqczb-yG88tl9hLKUd |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Time-Domain+Speech+Enhancement+Assisted+by+Multi-Resolution+Frequency+Encoder+and+Decoder&rft.au=Shi%2C+Hao&rft.au=Mimura%2C+Masato&rft.au=Wang%2C+Longbiao&rft.au=Dang%2C+Jianwu&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10094718&rft.externalDocID=10094718 |