Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder

Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequen...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors: Shi, Hao, Mimura, Masato, Wang, Longbiao, Dang, Jianwu, Kawahara, Tatsuya
Format: Conference Proceeding
Language:English
Published: IEEE 04.06.2023
Subjects:
ISSN:2379-190X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information, we supplement multiple spectrograms in different frame lengths into the time-domain encoders. They extract stationary frequency information in both narrowband and wideband. We also adopt multiple decoder outputs, each of which computes its corresponding resolution frequency loss. Experimental results show that (1) it is more effective to fuse stationary frequency features than non-stationary features in the encoder, and (2) the multiple outputs consistent with the frequency loss improve performance. Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement.
AbstractList Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information, we supplement multiple spectrograms in different frame lengths into the time-domain encoders. They extract stationary frequency information in both narrowband and wideband. We also adopt multiple decoder outputs, each of which computes its corresponding resolution frequency loss. Experimental results show that (1) it is more effective to fuse stationary frequency features than non-stationary features in the encoder, and (2) the multiple outputs consistent with the frequency loss improve performance. Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement.
Author Dang, Jianwu
Shi, Hao
Kawahara, Tatsuya
Wang, Longbiao
Mimura, Masato
Author_xml – sequence: 1
  givenname: Hao
  surname: Shi
  fullname: Shi, Hao
  organization: Kyoto University,Graduate School of Informatics,Kyoto,Japan
– sequence: 2
  givenname: Masato
  surname: Mimura
  fullname: Mimura, Masato
  organization: Kyoto University,Graduate School of Informatics,Kyoto,Japan
– sequence: 3
  givenname: Longbiao
  surname: Wang
  fullname: Wang, Longbiao
  organization: Tianjin University,College of Intelligence and Computing,Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin,China
– sequence: 4
  givenname: Jianwu
  surname: Dang
  fullname: Dang, Jianwu
  organization: Tianjin University,College of Intelligence and Computing,Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin,China
– sequence: 5
  givenname: Tatsuya
  surname: Kawahara
  fullname: Kawahara, Tatsuya
  organization: Kyoto University,Graduate School of Informatics,Kyoto,Japan
BookMark eNo1kMtOwzAURA0CibbwByzMByT4GdvLqg9AKgKRIsGqcuIb1ahxSpws8veNeKxmNnN0NFN0EZoACN1RklJKzP3TYp7nr8JwqVJGGE8pIUYoqs_QlCqmacaZUudowrgyCTXk4wpNY_wihGgl9AR9bn0NybKprQ84PwKUe7wKextKqCF0eB6jjx04XAz4uT90PnmD2Bz6zjcBr1v47iGUwzgpGwcttsHhJfz0a3RZ2UOEm7-coff1art4TDYvD6P2JvGUmi6xlVNSKVYVo53TTFgopKkYKTMzqnNnOWPaSuGkKGzJMzDEGpBSEGDSaj5Dt79cDwC7Y-tr2w67_x_4CXqyVYM
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP49357.2023.10094718
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1728163277
9781728163277
EISSN 2379-190X
EndPage 5
ExternalDocumentID 10094718
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i119t-afd75772fb190d824aeb59f20c692773da3228a54d54bac36e90a9e5540e25a83
IEDL.DBID RIE
IngestDate Wed Aug 27 02:35:11 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-afd75772fb190d824aeb59f20c692773da3228a54d54bac36e90a9e5540e25a83
PageCount 5
ParticipantIDs ieee_primary_10094718
PublicationCentury 2000
PublicationDate 2023-June-4
PublicationDateYYYYMMDD 2023-06-04
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June-4
  day: 04
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.377918
Snippet Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS [1] introduces multi-resolution STFT loss to enhance...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Decoding
Fuses
multiresolution spectrograms
Neural networks
Spectrogram
Speech enhancement
time domain
Time-domain analysis
Time-frequency analysis
Title Time-Domain Speech Enhancement Assisted by Multi-Resolution Frequency Encoder and Decoder
URI https://ieeexplore.ieee.org/document/10094718
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62iOjFV8U3EbymbjfJJjlKHyhIKVShnko2mdAe3Ja1FfrvTbK16sGDtxDILuTLzOQx3zcI3ULunPNLgVipDWGKGaJY7s3dpNRyoFxDFHF9Ev2-HI3UYE1Wj1wYAIjJZ9AMzfiWb2dmGa7KvIX7w4h3pjVUEyKryFobtysFkzvoZi2ieffYvh8OB0xRLpqhRHjza_CvMioxivT2__n_A9T45uPhwSbSHKItKI7Q3g8pwWP0GrgcpDN78yd9PJwDmAnuFpOAafgo9jAEQC3OVziSbkm4uK-WHe6VVUL1yg8JHPcS68LiDsR2A730us_tB7Ium0CmrZZaEO2s4H7T7HIf7K1MmYacK5cmJlOpENRqb8RSc2Y5y7WhGahEK_D7igRSriU9QfViVsApwi2qnXDcRJF7nWe5lQkPgszeh9pM0TPUCLM0nlfKGOOvCTr_o_8C7QYsYqoVu0T1RbmEK7RtPhbT9_I64vkJxnuh1g
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UjY-LL4xva-K1uGzb3e3RCAQiEhIwwRPpttPAwYWsYMK_t-0C6sGDt6ZJN5t-nZk-5vsGoXtIjTF2KRCdSEWYYIoIllpzVyHVHCiX4EVc23GnkwwGorskq3suDAD45DOouKZ_y9cTNXdXZdbC7WHEOtNNtMUZC4OCrrV2vEnMkh10t5TRfGg9PfZ6XSYojyuuSHhlNfxXIRUfRxoH__yDQ1T-ZuTh7jrWHKENyI7R_g8xwRP05tgcpDZ5t2d93JsCqBGuZyOHqvsotkA4SDVOF9jTbom7ui8WHm7kRUr1wg5xLPccy0zjGvh2Gb026v2nJlkWTiDjalXMiDQ65nbbbFIb7nUSMgkpFyYMVCTCOKZaWjNOJGeas1QqGoEIpAC7swgg5DKhp6iUTTI4Q7hKpYkNV17mXqZRqpOAO0lm60V1JOg5KrtZGk4LbYzhaoIu_ui_RbvN_kt72G51ni_RnsPFJ16xK1Sa5XO4Rtvqczb-yG88tl9hLKUd
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Time-Domain+Speech+Enhancement+Assisted+by+Multi-Resolution+Frequency+Encoder+and+Decoder&rft.au=Shi%2C+Hao&rft.au=Mimura%2C+Masato&rft.au=Wang%2C+Longbiao&rft.au=Dang%2C+Jianwu&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10094718&rft.externalDocID=10094718