A comprehensive study on supervised single-channel noisy speech separation with multi-task learning

This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all methods into two pipelines: enhancement priority pipeline (EPP) and separation priority pipeline (SPP), based on whether prioritizing enhancement...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Speech communication Ročník 167; s. 103162
Hlavní autoři: Dang, Shaoxiang, Matsumoto, Tetsuya, Takeuchi, Yoshinori, Kudo, Hiroaki
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.02.2025
Témata:
ISSN:0167-6393
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all methods into two pipelines: enhancement priority pipeline (EPP) and separation priority pipeline (SPP), based on whether prioritizing enhancement or separation. Next, we classify each pipeline into shared encoder–decoder scheme (SEDS) and independent encoder–decoder scheme (IEDS), depending on whether the two modules share the same encoder and decoder. Additionally, we introduce two types of intermediate structures between the two modules. One structure uses time–frequency (T–F) representations, while the other uses T–F masks. This article elaborates on the strengths and weaknesses of each approach, particularly in mitigating over-suppression and improving computational efficiency. Our experiments show substantial improvements in SPP with IEDS across multiple metrics on the LibriXmix dataset. In addition, by replacing the synthesis-based trick in the enhancement module, the model achieves superior generalization on the LibriCSS dataset. •We extend the SEDS structure for SE and SS by transitioning features to masks.•We propose negative gradient modulation as a simpler alternative to projection methods.•We mitigated over-suppression with a pipeline ensuring uncompromised input for separation.
AbstractList This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all methods into two pipelines: enhancement priority pipeline (EPP) and separation priority pipeline (SPP), based on whether prioritizing enhancement or separation. Next, we classify each pipeline into shared encoder–decoder scheme (SEDS) and independent encoder–decoder scheme (IEDS), depending on whether the two modules share the same encoder and decoder. Additionally, we introduce two types of intermediate structures between the two modules. One structure uses time–frequency (T–F) representations, while the other uses T–F masks. This article elaborates on the strengths and weaknesses of each approach, particularly in mitigating over-suppression and improving computational efficiency. Our experiments show substantial improvements in SPP with IEDS across multiple metrics on the LibriXmix dataset. In addition, by replacing the synthesis-based trick in the enhancement module, the model achieves superior generalization on the LibriCSS dataset. •We extend the SEDS structure for SE and SS by transitioning features to masks.•We propose negative gradient modulation as a simpler alternative to projection methods.•We mitigated over-suppression with a pipeline ensuring uncompromised input for separation.
ArticleNumber 103162
Author Kudo, Hiroaki
Takeuchi, Yoshinori
Dang, Shaoxiang
Matsumoto, Tetsuya
Author_xml – sequence: 1
  givenname: Shaoxiang
  surname: Dang
  fullname: Dang, Shaoxiang
  email: dang.shaoxiang.s0@s.mail.nagoya-u.ac.jp
  organization: Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 4648601, Aichi, Japan
– sequence: 2
  givenname: Tetsuya
  surname: Matsumoto
  fullname: Matsumoto, Tetsuya
  organization: Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 4648601, Aichi, Japan
– sequence: 3
  givenname: Yoshinori
  surname: Takeuchi
  fullname: Takeuchi, Yoshinori
  organization: School of Informatics, Daido University, 10-3 Takiharu-cho, Minami-ku, Nagoya, 4570819, Aichi, Japan
– sequence: 4
  givenname: Hiroaki
  surname: Kudo
  fullname: Kudo, Hiroaki
  organization: Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 4648601, Aichi, Japan
BookMark eNp9kMtOwzAQRb0oEm3hD1j4B1L8SOJkg1RVPCpVYgNry7UnxCV1Ik9a1L_HVVizGmnmnpnRWZBZ6AMQ8sDZijNePh5WOIDtjyvBRJ5akpdiRuZppLJS1vKWLBAPjLG8qsSc2DVN4SFCCwH9GSiOJ3ehfaB4GiCePYKj6MNXB5ltTQjQ0dB7vNB0BmxLEQYTzegT8ePHlh5P3eiz0eA37cDEkNA7ctOYDuH-ry7J58vzx-Yt272_bjfrXWYl42NW1YV0tS1NyQDqgrlGWSuEqgGaKndSqX21r4RquHK8rkWTF7IslEshWUjeyCXJp7029ogRGj1EfzTxojnTVzn6oCc5-ipHT3IS9jRhkH47e4garYdgwfkIdtSu9_8v-AWGpHVz
Cites_doi 10.1109/TSA.2005.858005
10.1016/j.csl.2009.02.006
10.1007/s10772-020-09674-2
10.1109/TASLP.2015.2512042
10.1109/TASLP.2023.3275033
10.1109/TASLP.2019.2915167
10.1109/TASLP.2015.2468583
10.1109/TASLP.2024.3407511
10.1109/TASLP.2018.2842159
10.1145/2595188.2595221
10.1109/TASLP.2017.2726762
10.1109/TASLP.2014.2305833
ContentType Journal Article
Copyright 2024 The Authors
Copyright_xml – notice: 2024 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.specom.2024.103162
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Languages & Literatures
Social Welfare & Social Work
Psychology
ExternalDocumentID 10_1016_j_specom_2024_103162
S016763932400133X
GroupedDBID --K
--M
-~X
.DC
.~1
07C
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
6I.
7-5
71M
8P~
9JN
9JO
AACTN
AADFP
AAEDT
AAEDW
AAFJI
AAFTH
AAGJA
AAGUQ
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXKI
AAXUO
AAYFN
ABBOA
ABDPE
ABFNM
ABIVO
ABJNI
ABMAC
ABMMH
ABOYX
ABWVN
ABXDB
ACDAQ
ACGFS
ACNNM
ACRLP
ACRPL
ACXNI
ACZNC
ADBBV
ADEZE
ADIYS
ADJOM
ADMUD
ADNMO
ADTZH
AEBSH
AECPX
AEIPS
AEKER
AENEX
AFJKZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOMHK
AOUOD
ASPBG
AVARZ
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OKEIE
OZT
P-8
P-9
P2P
PC.
PQQKQ
PRBVW
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSB
SSO
SST
SSV
SSY
SSZ
T5K
WUQ
XJE
~G-
9DU
AATTM
AAYWO
AAYXX
ACLOT
ACVFH
ADCNI
AEUPX
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c301t-8953d9c6a60ee950df7cc2279eef84d377b8b827f17d1992f453657dcc23531f3
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001393996400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-6393
IngestDate Sat Nov 29 06:17:52 EST 2025
Sat Jan 18 16:09:56 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Speech enhancement
Speech separation
Separation priority pipeline
Supervised learning
Multi-task learning
Language English
License This is an open access article under the CC BY license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c301t-8953d9c6a60ee950df7cc2279eef84d377b8b827f17d1992f453657dcc23531f3
OpenAccessLink https://dx.doi.org/10.1016/j.specom.2024.103162
ParticipantIDs crossref_primary_10_1016_j_specom_2024_103162
elsevier_sciencedirect_doi_10_1016_j_specom_2024_103162
PublicationCentury 2000
PublicationDate February 2025
2025-02-00
PublicationDateYYYYMMDD 2025-02-01
PublicationDate_xml – month: 02
  year: 2025
  text: February 2025
PublicationDecade 2020
PublicationTitle Speech communication
PublicationYear 2025
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Chen, Luo, Mesgarani (b4) 2017
Das, Chakraborty, Chaki, Padhy, Dey (b13) 2021; 24
Subakan, Ravanelli, Cornell, Bronzi, Zhong (b48) 2021
Dang, Matsumoto, Takeuchi, Kudo (b11) 2023
Hui, Cai, Guo, He, Zhang, Liu (b23) 2015
Maciejewski, Wichern, McQuinn, Le Roux (b35) 2020
Liu, Sun, Chen, Wang, Zhao, Lu, Wang (b29) 2023
Wisdom, Tzinis, Erdogan, Weiss, Wilson, Hershey (b56) 2020; 33
Pariente, Cornell, Cosentino, Sivasankaran, Tzinis, Heitkaemper, Olvera, Stöter, Hu, Martín-Doñas, Ditter, Frank, Deleforge, Vincent (b41) 2020
Kolbæk, Yu, Tan, Jensen (b25) 2017; 25
Li, Luo, Han, Li, Yoshioka, Zhou, Delcroix, Kinoshita, Boeddeker, Qian (b27) 2021
Yu, Kolbæk, Tan, Jensen (b57) 2017
Ma, Hou, Xu, Chng (b34) 2021
Dang, Matsumoto, Takeuchi, Kudo (b10) 2023
Mu, Yang, Yang, Zhu (b36) 2023
Chen, Yoshioka, Lu, Zhou, Meng, Luo, Wu, Xiao, Li (b7) 2020
Chen, Mao, Liu (b6) 2020
Défossez, Synnaeve, Adi (b14) 2020
Fan, Liu, Tao, Yi, Wen (b16) 2019
Le Roux, Wisdom, Erdogan, Hershey (b26) 2019
Panayotov, Chen, Povey, Khudanpur (b39) 2015
Zhang, Chen, Chen, Liu, Hu, Chng (b58) 2024
Taal, Hendriks, Heusdens, Jensen (b49) 2010
Luo, Chen, Yoshioka (b31) 2020
Shi, Li, Toda (b44) 2024
Chen, Hou, Hu, Shirol, Chng (b3) 2022
Lu, Li, Song, Wang, Dang, Wang, Zhang (b30) 2023
Huang, Kim, Hasegawa-Johnson, Smaragdis (b22) 2015; 23
Luo, Mesgarani (b33) 2019; 27
Shi, Mimura, Kawahara (b45) 2024; 32
Hu, Hou, Chen, Chng (b20) 2022
Liu, Nie, Liang, Liu, Yu, Chen, Peng, Li (b28) 2019
Shi, Wang, Ge, Li, Dang (b47) 2020
Chen, Mao, Liu (b5) 2020
Cooke, Hershey, Rennie (b8) 2010; 24
Shi, Shimada, Hirano, Shibuya, Koyama, Zhong, Takahashi, Kawahara, Mitsufuji (b46) 2024
Wang, Chen (b53) 2018; 26
Nachmani, Adi, Wolf (b37) 2020
Wichern, Antognini, Flynn, Zhu, McQuinn, Crow, Manilow, Roux (b54) 2019
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b51) 2017; 30
Williamson, Wang, Wang (b55) 2015; 24
Pandey, Liu, Wang, Saraf (b40) 2021
Narayanan, Wang (b38) 2014; 22
Vincent, Gribonval, Févotte (b52) 2006; 14
Zhu, Zhang, Zhang, Dai (b59) 2023; 31
Carrasco, R.C., 2014. An open-source OCR evaluation tool. In: Proceedings of the First International Conference on Digital Access To Textual Cultural Heritage. pp. 179–184.
Park, Kang, Shin, Kim, Han (b42) 2022
Cosentino, Pariente, Cornell, Deleforge, Vincent (b9) 2020
Hershey, Chen, Le Roux, Watanabe (b18) 2016
Baevski, Zhou, Mohamed, Auli (b1) 2020; 33
Hu, Chen, Zou, Zhong, Chng (b19) 2023
Series (b43) 2011
Erdogan, Hershey, Watanabe, Le Roux (b15) 2015
Luo, Mesgarani (b32) 2018
Tzinis, Wang, Smaragdis (b50) 2020
Dang, Matsumoto, Takeuchi, Kudo (b12) 2024
Huang, Kim, Hasegawa-Johnson, Smaragdis (b21) 2014
Fu, Liao, Tsao, Lin (b17) 2019
Zhu, Zhang, Zhang, Wu, Fang, Dai (b60) 2022
Kingma, D.P., Ba, J., 2015. Adam: A Method for Stochastic Optimization. In: 3rd International Conference on Learning Representations ICLR.
Défossez (10.1016/j.specom.2024.103162_b14) 2020
Pariente (10.1016/j.specom.2024.103162_b41) 2020
Le Roux (10.1016/j.specom.2024.103162_b26) 2019
Wichern (10.1016/j.specom.2024.103162_b54) 2019
Vincent (10.1016/j.specom.2024.103162_b52) 2006; 14
Williamson (10.1016/j.specom.2024.103162_b55) 2015; 24
Chen (10.1016/j.specom.2024.103162_b3) 2022
Shi (10.1016/j.specom.2024.103162_b44) 2024
Shi (10.1016/j.specom.2024.103162_b47) 2020
Hui (10.1016/j.specom.2024.103162_b23) 2015
Hu (10.1016/j.specom.2024.103162_b20) 2022
Ma (10.1016/j.specom.2024.103162_b34) 2021
Huang (10.1016/j.specom.2024.103162_b22) 2015; 23
Chen (10.1016/j.specom.2024.103162_b6) 2020
Park (10.1016/j.specom.2024.103162_b42) 2022
Pandey (10.1016/j.specom.2024.103162_b40) 2021
Nachmani (10.1016/j.specom.2024.103162_b37) 2020
Zhu (10.1016/j.specom.2024.103162_b60) 2022
Fu (10.1016/j.specom.2024.103162_b17) 2019
Liu (10.1016/j.specom.2024.103162_b28) 2019
Liu (10.1016/j.specom.2024.103162_b29) 2023
10.1016/j.specom.2024.103162_b24
Dang (10.1016/j.specom.2024.103162_b11) 2023
Kolbæk (10.1016/j.specom.2024.103162_b25) 2017; 25
Luo (10.1016/j.specom.2024.103162_b31) 2020
Li (10.1016/j.specom.2024.103162_b27) 2021
Luo (10.1016/j.specom.2024.103162_b32) 2018
Cosentino (10.1016/j.specom.2024.103162_b9) 2020
Chen (10.1016/j.specom.2024.103162_b7) 2020
Panayotov (10.1016/j.specom.2024.103162_b39) 2015
Tzinis (10.1016/j.specom.2024.103162_b50) 2020
Das (10.1016/j.specom.2024.103162_b13) 2021; 24
Shi (10.1016/j.specom.2024.103162_b45) 2024; 32
Dang (10.1016/j.specom.2024.103162_b10) 2023
Luo (10.1016/j.specom.2024.103162_b33) 2019; 27
Chen (10.1016/j.specom.2024.103162_b4) 2017
Hu (10.1016/j.specom.2024.103162_b19) 2023
Mu (10.1016/j.specom.2024.103162_b36) 2023
Vaswani (10.1016/j.specom.2024.103162_b51) 2017; 30
Yu (10.1016/j.specom.2024.103162_b57) 2017
Zhu (10.1016/j.specom.2024.103162_b59) 2023; 31
Narayanan (10.1016/j.specom.2024.103162_b38) 2014; 22
Shi (10.1016/j.specom.2024.103162_b46) 2024
Baevski (10.1016/j.specom.2024.103162_b1) 2020; 33
Taal (10.1016/j.specom.2024.103162_b49) 2010
Wisdom (10.1016/j.specom.2024.103162_b56) 2020; 33
Zhang (10.1016/j.specom.2024.103162_b58) 2024
Huang (10.1016/j.specom.2024.103162_b21) 2014
Subakan (10.1016/j.specom.2024.103162_b48) 2021
10.1016/j.specom.2024.103162_b2
Fan (10.1016/j.specom.2024.103162_b16) 2019
Cooke (10.1016/j.specom.2024.103162_b8) 2010; 24
Lu (10.1016/j.specom.2024.103162_b30) 2023
Wang (10.1016/j.specom.2024.103162_b53) 2018; 26
Chen (10.1016/j.specom.2024.103162_b5) 2020
Maciejewski (10.1016/j.specom.2024.103162_b35) 2020
Dang (10.1016/j.specom.2024.103162_b12) 2024
Erdogan (10.1016/j.specom.2024.103162_b15) 2015
Hershey (10.1016/j.specom.2024.103162_b18) 2016
Series (10.1016/j.specom.2024.103162_b43) 2011
References_xml – start-page: 2637
  year: 2020
  end-page: 2641
  ident: b41
  article-title: Asteroid: The PyTorch-based audio source separation toolkit for researchers
  publication-title: Annual Conference of the International Speech Communication Association
– volume: 30
  start-page: 5998
  year: 2017
  end-page: 6008
  ident: b51
  article-title: Attention is all you need
  publication-title: Adv. Neural Inf. Process. Syst.
– start-page: 4599
  year: 2019
  end-page: 4603
  ident: b16
  article-title: Discriminative learning for monaural speech separation using deep embedding features
  publication-title: Annual Conference of the International Speech Communication Association
– start-page: 2627
  year: 2020
  end-page: 2631
  ident: b6
  article-title: On synthesis for supervised monaural speech separation in time domain
  publication-title: Annual Conference of the International Speech Communication Association
– reference: Kingma, D.P., Ba, J., 2015. Adam: A Method for Stochastic Optimization. In: 3rd International Conference on Learning Representations ICLR.
– volume: 24
  start-page: 1
  year: 2010
  end-page: 15
  ident: b8
  article-title: Monaural speech separation and recognition challenge
  publication-title: Comput. Speech Lang.
– start-page: 2642
  year: 2020
  end-page: 2646
  ident: b5
  article-title: Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation
  publication-title: Annual Conference of the International Speech Communication Association
– volume: 33
  start-page: 3846
  year: 2020
  end-page: 3857
  ident: b56
  article-title: Unsupervised sound separation using mixture invariant training
  publication-title: Adv. Neural Inf. Process. Syst.
– start-page: 3759
  year: 2023
  end-page: 3763
  ident: b11
  article-title: Using semi-supervised learning for monaural time-domain speech separation with a self-supervised learning-based SI-SNR estimator
  publication-title: Annual Conference of the International Speech Communication Association
– volume: 24
  start-page: 883
  year: 2021
  end-page: 901
  ident: b13
  article-title: Fundamentals, present and future perspectives of speech enhancement
  publication-title: Int. J. Speech Technol.
– start-page: 497
  year: 2021
  end-page: 502
  ident: b34
  article-title: Multitask-based joint learning approach to robust ASR for radio communication speech
  publication-title: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
– start-page: 46
  year: 2020
  end-page: 50
  ident: b31
  article-title: Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 223
  year: 2021
  end-page: 228
  ident: b40
  article-title: Dual application of speech enhancement for automatic speech recognition
  publication-title: IEEE Spoken Language Technology Workshop SLT
– start-page: 626
  year: 2019
  end-page: 630
  ident: b26
  article-title: SDR - Half-baked or well done?
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 1893
  year: 2023
  end-page: 1897
  ident: b29
  article-title: Multi-level knowledge distillation for speech emotion recognition in noisy conditions
  publication-title: Annual Conference of the International Speech Communication Association
– start-page: 4214
  year: 2010
  end-page: 4217
  ident: b49
  article-title: A short-time objective intelligibility measure for time-frequency weighted noisy speech
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 1
  year: 2020
  end-page: 6
  ident: b50
  article-title: Sudo RM -RF: Efficient networks for universal audio source separation
  publication-title: 30th International Workshop on Machine Learning for Signal Processing
– start-page: 375
  year: 2023
  end-page: 380
  ident: b10
  article-title: Time-Domain Monaural Speech Separation of Introducing Discriminative Loss Between Speakers
– start-page: 241
  year: 2017
  end-page: 245
  ident: b57
  article-title: Permutation invariant training of deep models for speaker-independent multi-talker speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 23
  start-page: 2136
  year: 2015
  end-page: 2147
  ident: b22
  article-title: Joint optimization of masks and deep recurrent neural networks for monaural source separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 1
  year: 2023
  end-page: 5
  ident: b30
  article-title: Speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 3174
  year: 2022
  end-page: 3178
  ident: b60
  article-title: A noise-robust self-supervised pre-training model based speech representation learning for automatic speech recognition
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 1368
  year: 2019
  end-page: 1372
  ident: b54
  article-title: WHAM!: Extending speech separation to noisy environments
  publication-title: Annual Conference of the International Speech Communication Association
– start-page: 12511
  year: 2024
  end-page: 12515
  ident: b12
  article-title: A separation priority pipeline for single-channel speech separation in noisy environments
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 2024
  year: 2024
  end-page: 2350
  ident: b44
  article-title: Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition
  publication-title: Annual Conference of the International Speech Communication Association
– start-page: 708
  year: 2015
  end-page: 712
  ident: b15
  article-title: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 21
  year: 2021
  end-page: 25
  ident: b48
  article-title: Attention is all you need in speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 22
  start-page: 826
  year: 2014
  end-page: 835
  ident: b38
  article-title: Investigation of speech separation as a front-end for noise robust speech recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 31
  year: 2016
  end-page: 35
  ident: b18
  article-title: Deep clustering: Discriminative embeddings for segmentation and separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 7544
  year: 2020
  end-page: 7548
  ident: b47
  article-title: Spectrograms fusion with minimum difference masks estimation for monaural speech dereverberation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 12951
  year: 2024
  end-page: 12955
  ident: b46
  article-title: Diffusion-based speech enhancement with joint generative and predictive decoders
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 1
  year: 2023
  end-page: 5
  ident: b36
  article-title: A multi-stage triple-path method for speech separation in noisy and reverberant environments
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– year: 2020
  ident: b9
  article-title: LibriMix: An open-source dataset for generalizable speech separation
– start-page: 7284
  year: 2020
  end-page: 7288
  ident: b7
  article-title: Continuous speech separation: Dataset and analysis
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 26
  start-page: 1702
  year: 2018
  end-page: 1726
  ident: b53
  article-title: Supervised speech separation based on deep learning: An overview
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– volume: 24
  start-page: 483
  year: 2015
  end-page: 492
  ident: b55
  article-title: Complex ratio masking for monaural speech separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 865
  year: 2021
  end-page: 872
  ident: b27
  article-title: Dual-path RNN for long recording speech separation
  publication-title: IEEE Spoken Language Technology Workshop
– start-page: 4298
  year: 2022
  end-page: 4302
  ident: b3
  article-title: Noise-robust speech recognition with 10 minutes unparalleled in-domain data
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 7842
  year: 2022
  end-page: 7846
  ident: b42
  article-title: Manner: Multi-view attention network for noise erasure
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 25
  start-page: 1901
  year: 2017
  end-page: 1913
  ident: b25
  article-title: Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 1381
  year: 2024
  end-page: 1385
  ident: b58
  article-title: Noise-aware speech separation with contrastive learning
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 27
  start-page: 1256
  year: 2019
  end-page: 1266
  ident: b33
  article-title: Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 1
  year: 2023
  end-page: 5
  ident: b19
  article-title: Unifying speech enhancement and separation with gradient modulation for end-to-end noise-robust speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– reference: Carrasco, R.C., 2014. An open-source OCR evaluation tool. In: Proceedings of the First International Conference on Digital Access To Textual Cultural Heritage. pp. 179–184.
– start-page: 6292
  year: 2022
  end-page: 6296
  ident: b20
  article-title: Interactive feature fusion for end-to-end noise-robust speech recognition
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 3291
  year: 2020
  end-page: 3295
  ident: b14
  article-title: Real time speech enhancement in the waveform domain
  publication-title: Annual Conference of the International Speech Communication Association
– start-page: 491
  year: 2019
  end-page: 495
  ident: b28
  article-title: Jointly adversarial enhancement training for robust end-to-end speech recognition
  publication-title: Annual Conference of the International Speech Communication Association
– volume: 14
  start-page: 1462
  year: 2006
  end-page: 1469
  ident: b52
  article-title: Performance measurement in blind audio source separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– volume: 33
  start-page: 12449
  year: 2020
  end-page: 12460
  ident: b1
  article-title: Wav2vec 2.0: A framework for self-supervised learning of speech representations
  publication-title: Adv. Neural Inf. Process. Syst.
– start-page: 246
  year: 2017
  end-page: 250
  ident: b4
  article-title: Deep attractor network for single-microphone speaker separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 1562
  year: 2014
  end-page: 1566
  ident: b21
  article-title: Deep learning for monaural speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 31
  start-page: 1927
  year: 2023
  end-page: 1939
  ident: b59
  article-title: A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 696
  year: 2020
  end-page: 700
  ident: b35
  article-title: WHAMR!: Noisy and reverberant single-channel speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– start-page: 2031
  year: 2019
  end-page: 2041
  ident: b17
  article-title: MetricGAN: Generative adversarial networks based black-box metric scores optimization for speech enhancement
  publication-title: International Conference on Machine Learning
– start-page: 7164
  year: 2020
  end-page: 7175
  ident: b37
  article-title: Voice separation with an unknown number of multiple speakers
  publication-title: International Conference on Machine Learning
– start-page: 24
  year: 2015
  end-page: 27
  ident: b23
  article-title: Convolutional maxout neural networks for speech separation
  publication-title: IEEE International Symposium on Signal Processing and Information Technology
– start-page: 5206
  year: 2015
  end-page: 5210
  ident: b39
  article-title: LibriSpeech: an ASR corpus based on public domain audio books
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– year: 2011
  ident: b43
  article-title: Algorithms to measure audio programme loudness and true-peak audio level
– start-page: 696
  year: 2018
  end-page: 700
  ident: b32
  article-title: TasNet: time-domain audio separation network for real-time, single-channel speech separation
  publication-title: International Conference on Acoustics, Speech and Signal Processing
– volume: 32
  start-page: 3049
  year: 2024
  end-page: 3060
  ident: b45
  article-title: Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 7164
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b37
  article-title: Voice separation with an unknown number of multiple speakers
– volume: 14
  start-page: 1462
  issue: 4
  year: 2006
  ident: 10.1016/j.specom.2024.103162_b52
  article-title: Performance measurement in blind audio source separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TSA.2005.858005
– volume: 30
  start-page: 5998
  year: 2017
  ident: 10.1016/j.specom.2024.103162_b51
  article-title: Attention is all you need
  publication-title: Adv. Neural Inf. Process. Syst.
– start-page: 46
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b31
  article-title: Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation
– start-page: 5206
  year: 2015
  ident: 10.1016/j.specom.2024.103162_b39
  article-title: LibriSpeech: an ASR corpus based on public domain audio books
– start-page: 1
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b30
  article-title: Speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
– start-page: 241
  year: 2017
  ident: 10.1016/j.specom.2024.103162_b57
  article-title: Permutation invariant training of deep models for speaker-independent multi-talker speech separation
– start-page: 1368
  year: 2019
  ident: 10.1016/j.specom.2024.103162_b54
  article-title: WHAM!: Extending speech separation to noisy environments
– start-page: 6292
  year: 2022
  ident: 10.1016/j.specom.2024.103162_b20
  article-title: Interactive feature fusion for end-to-end noise-robust speech recognition
– start-page: 375
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b10
– start-page: 223
  year: 2021
  ident: 10.1016/j.specom.2024.103162_b40
  article-title: Dual application of speech enhancement for automatic speech recognition
– volume: 24
  start-page: 1
  issue: 1
  year: 2010
  ident: 10.1016/j.specom.2024.103162_b8
  article-title: Monaural speech separation and recognition challenge
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2009.02.006
– start-page: 1893
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b29
  article-title: Multi-level knowledge distillation for speech emotion recognition in noisy conditions
– volume: 24
  start-page: 883
  year: 2021
  ident: 10.1016/j.specom.2024.103162_b13
  article-title: Fundamentals, present and future perspectives of speech enhancement
  publication-title: Int. J. Speech Technol.
  doi: 10.1007/s10772-020-09674-2
– start-page: 3759
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b11
  article-title: Using semi-supervised learning for monaural time-domain speech separation with a self-supervised learning-based SI-SNR estimator
– start-page: 4298
  year: 2022
  ident: 10.1016/j.specom.2024.103162_b3
  article-title: Noise-robust speech recognition with 10 minutes unparalleled in-domain data
– start-page: 1381
  year: 2024
  ident: 10.1016/j.specom.2024.103162_b58
  article-title: Noise-aware speech separation with contrastive learning
– volume: 24
  start-page: 483
  issue: 3
  year: 2015
  ident: 10.1016/j.specom.2024.103162_b55
  article-title: Complex ratio masking for monaural speech separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2015.2512042
– start-page: 21
  year: 2021
  ident: 10.1016/j.specom.2024.103162_b48
  article-title: Attention is all you need in speech separation
– volume: 31
  start-page: 1927
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b59
  article-title: A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2023.3275033
– volume: 27
  start-page: 1256
  issue: 8
  year: 2019
  ident: 10.1016/j.specom.2024.103162_b33
  article-title: Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2019.2915167
– start-page: 2637
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b41
  article-title: Asteroid: The PyTorch-based audio source separation toolkit for researchers
– start-page: 12951
  year: 2024
  ident: 10.1016/j.specom.2024.103162_b46
  article-title: Diffusion-based speech enhancement with joint generative and predictive decoders
– start-page: 497
  year: 2021
  ident: 10.1016/j.specom.2024.103162_b34
  article-title: Multitask-based joint learning approach to robust ASR for radio communication speech
– volume: 23
  start-page: 2136
  issue: 12
  year: 2015
  ident: 10.1016/j.specom.2024.103162_b22
  article-title: Joint optimization of masks and deep recurrent neural networks for monaural source separation
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2015.2468583
– start-page: 696
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b35
  article-title: WHAMR!: Noisy and reverberant single-channel speech separation
– start-page: 2642
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b5
  article-title: Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation
– start-page: 708
  year: 2015
  ident: 10.1016/j.specom.2024.103162_b15
  article-title: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
– start-page: 626
  year: 2019
  ident: 10.1016/j.specom.2024.103162_b26
  article-title: SDR - Half-baked or well done?
– volume: 32
  start-page: 3049
  year: 2024
  ident: 10.1016/j.specom.2024.103162_b45
  article-title: Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2024.3407511
– start-page: 2024
  year: 2024
  ident: 10.1016/j.specom.2024.103162_b44
  article-title: Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition
– ident: 10.1016/j.specom.2024.103162_b24
– volume: 26
  start-page: 1702
  issue: 10
  year: 2018
  ident: 10.1016/j.specom.2024.103162_b53
  article-title: Supervised speech separation based on deep learning: An overview
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2018.2842159
– start-page: 12511
  year: 2024
  ident: 10.1016/j.specom.2024.103162_b12
  article-title: A separation priority pipeline for single-channel speech separation in noisy environments
– start-page: 1
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b50
  article-title: Sudo RM -RF: Efficient networks for universal audio source separation
– start-page: 696
  year: 2018
  ident: 10.1016/j.specom.2024.103162_b32
  article-title: TasNet: time-domain audio separation network for real-time, single-channel speech separation
– start-page: 7842
  year: 2022
  ident: 10.1016/j.specom.2024.103162_b42
  article-title: Manner: Multi-view attention network for noise erasure
– start-page: 1
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b19
  article-title: Unifying speech enhancement and separation with gradient modulation for end-to-end noise-robust speech separation
– volume: 33
  start-page: 3846
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b56
  article-title: Unsupervised sound separation using mixture invariant training
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2020
  ident: 10.1016/j.specom.2024.103162_b9
– year: 2011
  ident: 10.1016/j.specom.2024.103162_b43
– start-page: 3174
  year: 2022
  ident: 10.1016/j.specom.2024.103162_b60
  article-title: A noise-robust self-supervised pre-training model based speech representation learning for automatic speech recognition
– start-page: 491
  year: 2019
  ident: 10.1016/j.specom.2024.103162_b28
  article-title: Jointly adversarial enhancement training for robust end-to-end speech recognition
– start-page: 1562
  year: 2014
  ident: 10.1016/j.specom.2024.103162_b21
  article-title: Deep learning for monaural speech separation
– start-page: 24
  year: 2015
  ident: 10.1016/j.specom.2024.103162_b23
  article-title: Convolutional maxout neural networks for speech separation
– start-page: 1
  year: 2023
  ident: 10.1016/j.specom.2024.103162_b36
  article-title: A multi-stage triple-path method for speech separation in noisy and reverberant environments
– start-page: 7544
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b47
  article-title: Spectrograms fusion with minimum difference masks estimation for monaural speech dereverberation
– ident: 10.1016/j.specom.2024.103162_b2
  doi: 10.1145/2595188.2595221
– start-page: 246
  year: 2017
  ident: 10.1016/j.specom.2024.103162_b4
  article-title: Deep attractor network for single-microphone speaker separation
– start-page: 2627
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b6
  article-title: On synthesis for supervised monaural speech separation in time domain
– volume: 25
  start-page: 1901
  issue: 10
  year: 2017
  ident: 10.1016/j.specom.2024.103162_b25
  article-title: Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2017.2726762
– start-page: 7284
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b7
  article-title: Continuous speech separation: Dataset and analysis
– volume: 22
  start-page: 826
  issue: 4
  year: 2014
  ident: 10.1016/j.specom.2024.103162_b38
  article-title: Investigation of speech separation as a front-end for noise robust speech recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2014.2305833
– start-page: 4214
  year: 2010
  ident: 10.1016/j.specom.2024.103162_b49
  article-title: A short-time objective intelligibility measure for time-frequency weighted noisy speech
– start-page: 4599
  year: 2019
  ident: 10.1016/j.specom.2024.103162_b16
  article-title: Discriminative learning for monaural speech separation using deep embedding features
– start-page: 865
  year: 2021
  ident: 10.1016/j.specom.2024.103162_b27
  article-title: Dual-path RNN for long recording speech separation
– start-page: 3291
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b14
  article-title: Real time speech enhancement in the waveform domain
– start-page: 2031
  year: 2019
  ident: 10.1016/j.specom.2024.103162_b17
  article-title: MetricGAN: Generative adversarial networks based black-box metric scores optimization for speech enhancement
– start-page: 31
  year: 2016
  ident: 10.1016/j.specom.2024.103162_b18
  article-title: Deep clustering: Discriminative embeddings for segmentation and separation
– volume: 33
  start-page: 12449
  year: 2020
  ident: 10.1016/j.specom.2024.103162_b1
  article-title: Wav2vec 2.0: A framework for self-supervised learning of speech representations
  publication-title: Adv. Neural Inf. Process. Syst.
SSID ssj0004882
Score 2.428249
Snippet This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 103162
SubjectTerms Multi-task learning
Separation priority pipeline
Speech enhancement
Speech separation
Supervised learning
Title A comprehensive study on supervised single-channel noisy speech separation with multi-task learning
URI https://dx.doi.org/10.1016/j.specom.2024.103162
Volume 167
WOSCitedRecordID wos001393996400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0167-6393
  databaseCode: AIEXJ
  dateStart: 20220201
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0004882
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0167-6393
  databaseCode: AIEXJ
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0004882
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ba9swFBZbuoe-jC27tF039DD2YjQa24rkxzA6uhHKoBnLm1EkuUkTrGC5I_n3PbJ8ScgY22AvxhaWEvx9HH0-PheE3nMleEwlI5JLQeI4C0kiYkFUmCmaCBkPZNW1ZMyur_l0mnyrv5jaqp0Ay3O-2STr_wo1jAHYLnX2L-BuF4UBOAfQ4Qiww_GPgB9VYeKFnteh6daXjc4De792hsGCxHQOgpUmLus316sgNwu7DexaazkPrPblwI1P8PYhh6QUdtm0mLjdVbQ3fpbcTTTp3N-1M3ouzAZ4eNv5v0t4JKZq4RRMNFxs2-1hIpbadWipNgfjPGSmWHRfnFQ152pRGNC-uz6LkDZhzo0j7SCZxvs2wWaDYIr2jLNv1nFg6L3P4e6jy0c1rqJAGLv6AYPatO-X0L5xS7uVXcAsvJRPH6OjkNGE99DR6Mvl9GuXScur9mLtX2mSLauIwMPf-rWY2REok2foaf1mgUeeEc_RI5330etx7Y-2-AMetyW0bR8dt1vfto_OfZY2_qFXmSg03NsMmGL5AskR3uMUrjiFTY47TuF9TuGKU9hzCnecwo5TuOMUbjj1En3_fDn5dEXq7hxEwqZQEp7QSCVyKIYXWif0QmVMSlePUuuMxypibMZnPGTZgCkX45zFNBpSpuCmCAx_Fr1Cvdzk-gThRGmqFQNlm7F4RiVIdgW7n2CgP1mo5CkizWNO174IS9pEJ96lHpbUwZJ6WE4Ra7BIayHpBWIK9PntzLN_nvkGHXdMP0e9srjXb9ET-bNc2OJdzbMHr9-gPw
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comprehensive+study+on+supervised+single-channel+noisy+speech+separation+with+multi-task+learning&rft.jtitle=Speech+communication&rft.au=Dang%2C+Shaoxiang&rft.au=Matsumoto%2C+Tetsuya&rft.au=Takeuchi%2C+Yoshinori&rft.au=Kudo%2C+Hiroaki&rft.date=2025-02-01&rft.pub=Elsevier+B.V&rft.issn=0167-6393&rft.volume=167&rft_id=info:doi/10.1016%2Fj.specom.2024.103162&rft.externalDocID=S016763932400133X
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-6393&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-6393&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-6393&client=summon