A comprehensive study on supervised single-channel noisy speech separation with multi-task learning
This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all methods into two pipelines: enhancement priority pipeline (EPP) and separation priority pipeline (SPP), based on whether prioritizing enhancement...
Uloženo v:
| Vydáno v: | Speech communication Ročník 167; s. 103162 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.02.2025
|
| Témata: | |
| ISSN: | 0167-6393 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all methods into two pipelines: enhancement priority pipeline (EPP) and separation priority pipeline (SPP), based on whether prioritizing enhancement or separation. Next, we classify each pipeline into shared encoder–decoder scheme (SEDS) and independent encoder–decoder scheme (IEDS), depending on whether the two modules share the same encoder and decoder. Additionally, we introduce two types of intermediate structures between the two modules. One structure uses time–frequency (T–F) representations, while the other uses T–F masks. This article elaborates on the strengths and weaknesses of each approach, particularly in mitigating over-suppression and improving computational efficiency. Our experiments show substantial improvements in SPP with IEDS across multiple metrics on the LibriXmix dataset. In addition, by replacing the synthesis-based trick in the enhancement module, the model achieves superior generalization on the LibriCSS dataset.
•We extend the SEDS structure for SE and SS by transitioning features to masks.•We propose negative gradient modulation as a simpler alternative to projection methods.•We mitigated over-suppression with a pipeline ensuring uncompromised input for separation. |
|---|---|
| AbstractList | This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all methods into two pipelines: enhancement priority pipeline (EPP) and separation priority pipeline (SPP), based on whether prioritizing enhancement or separation. Next, we classify each pipeline into shared encoder–decoder scheme (SEDS) and independent encoder–decoder scheme (IEDS), depending on whether the two modules share the same encoder and decoder. Additionally, we introduce two types of intermediate structures between the two modules. One structure uses time–frequency (T–F) representations, while the other uses T–F masks. This article elaborates on the strengths and weaknesses of each approach, particularly in mitigating over-suppression and improving computational efficiency. Our experiments show substantial improvements in SPP with IEDS across multiple metrics on the LibriXmix dataset. In addition, by replacing the synthesis-based trick in the enhancement module, the model achieves superior generalization on the LibriCSS dataset.
•We extend the SEDS structure for SE and SS by transitioning features to masks.•We propose negative gradient modulation as a simpler alternative to projection methods.•We mitigated over-suppression with a pipeline ensuring uncompromised input for separation. |
| ArticleNumber | 103162 |
| Author | Kudo, Hiroaki Takeuchi, Yoshinori Dang, Shaoxiang Matsumoto, Tetsuya |
| Author_xml | – sequence: 1 givenname: Shaoxiang surname: Dang fullname: Dang, Shaoxiang email: dang.shaoxiang.s0@s.mail.nagoya-u.ac.jp organization: Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 4648601, Aichi, Japan – sequence: 2 givenname: Tetsuya surname: Matsumoto fullname: Matsumoto, Tetsuya organization: Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 4648601, Aichi, Japan – sequence: 3 givenname: Yoshinori surname: Takeuchi fullname: Takeuchi, Yoshinori organization: School of Informatics, Daido University, 10-3 Takiharu-cho, Minami-ku, Nagoya, 4570819, Aichi, Japan – sequence: 4 givenname: Hiroaki surname: Kudo fullname: Kudo, Hiroaki organization: Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 4648601, Aichi, Japan |
| BookMark | eNp9kMtOwzAQRb0oEm3hD1j4B1L8SOJkg1RVPCpVYgNry7UnxCV1Ik9a1L_HVVizGmnmnpnRWZBZ6AMQ8sDZijNePh5WOIDtjyvBRJ5akpdiRuZppLJS1vKWLBAPjLG8qsSc2DVN4SFCCwH9GSiOJ3ehfaB4GiCePYKj6MNXB5ltTQjQ0dB7vNB0BmxLEQYTzegT8ePHlh5P3eiz0eA37cDEkNA7ctOYDuH-ry7J58vzx-Yt272_bjfrXWYl42NW1YV0tS1NyQDqgrlGWSuEqgGaKndSqX21r4RquHK8rkWTF7IslEshWUjeyCXJp7029ogRGj1EfzTxojnTVzn6oCc5-ipHT3IS9jRhkH47e4garYdgwfkIdtSu9_8v-AWGpHVz |
| Cites_doi | 10.1109/TSA.2005.858005 10.1016/j.csl.2009.02.006 10.1007/s10772-020-09674-2 10.1109/TASLP.2015.2512042 10.1109/TASLP.2023.3275033 10.1109/TASLP.2019.2915167 10.1109/TASLP.2015.2468583 10.1109/TASLP.2024.3407511 10.1109/TASLP.2018.2842159 10.1145/2595188.2595221 10.1109/TASLP.2017.2726762 10.1109/TASLP.2014.2305833 |
| ContentType | Journal Article |
| Copyright | 2024 The Authors |
| Copyright_xml | – notice: 2024 The Authors |
| DBID | 6I. AAFTH AAYXX CITATION |
| DOI | 10.1016/j.specom.2024.103162 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Languages & Literatures Social Welfare & Social Work Psychology |
| ExternalDocumentID | 10_1016_j_specom_2024_103162 S016763932400133X |
| GroupedDBID | --K --M -~X .DC .~1 07C 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 6I. 7-5 71M 8P~ 9JN 9JO AACTN AADFP AAEDT AAEDW AAFJI AAFTH AAGJA AAGUQ AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXKI AAXUO AAYFN ABBOA ABDPE ABFNM ABIVO ABJNI ABMAC ABMMH ABOYX ABWVN ABXDB ACDAQ ACGFS ACNNM ACRLP ACRPL ACXNI ACZNC ADBBV ADEZE ADIYS ADJOM ADMUD ADNMO ADTZH AEBSH AECPX AEIPS AEKER AENEX AFJKZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOMHK AOUOD ASPBG AVARZ AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OKEIE OZT P-8 P-9 P2P PC. PQQKQ PRBVW Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SSB SSO SST SSV SSY SSZ T5K WUQ XJE ~G- 9DU AATTM AAYWO AAYXX ACLOT ACVFH ADCNI AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c301t-8953d9c6a60ee950df7cc2279eef84d377b8b827f17d1992f453657dcc23531f3 |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001393996400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0167-6393 |
| IngestDate | Sat Nov 29 06:17:52 EST 2025 Sat Jan 18 16:09:56 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Speech enhancement Speech separation Separation priority pipeline Supervised learning Multi-task learning |
| Language | English |
| License | This is an open access article under the CC BY license. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c301t-8953d9c6a60ee950df7cc2279eef84d377b8b827f17d1992f453657dcc23531f3 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.specom.2024.103162 |
| ParticipantIDs | crossref_primary_10_1016_j_specom_2024_103162 elsevier_sciencedirect_doi_10_1016_j_specom_2024_103162 |
| PublicationCentury | 2000 |
| PublicationDate | February 2025 2025-02-00 |
| PublicationDateYYYYMMDD | 2025-02-01 |
| PublicationDate_xml | – month: 02 year: 2025 text: February 2025 |
| PublicationDecade | 2020 |
| PublicationTitle | Speech communication |
| PublicationYear | 2025 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Chen, Luo, Mesgarani (b4) 2017 Das, Chakraborty, Chaki, Padhy, Dey (b13) 2021; 24 Subakan, Ravanelli, Cornell, Bronzi, Zhong (b48) 2021 Dang, Matsumoto, Takeuchi, Kudo (b11) 2023 Hui, Cai, Guo, He, Zhang, Liu (b23) 2015 Maciejewski, Wichern, McQuinn, Le Roux (b35) 2020 Liu, Sun, Chen, Wang, Zhao, Lu, Wang (b29) 2023 Wisdom, Tzinis, Erdogan, Weiss, Wilson, Hershey (b56) 2020; 33 Pariente, Cornell, Cosentino, Sivasankaran, Tzinis, Heitkaemper, Olvera, Stöter, Hu, Martín-Doñas, Ditter, Frank, Deleforge, Vincent (b41) 2020 Kolbæk, Yu, Tan, Jensen (b25) 2017; 25 Li, Luo, Han, Li, Yoshioka, Zhou, Delcroix, Kinoshita, Boeddeker, Qian (b27) 2021 Yu, Kolbæk, Tan, Jensen (b57) 2017 Ma, Hou, Xu, Chng (b34) 2021 Dang, Matsumoto, Takeuchi, Kudo (b10) 2023 Mu, Yang, Yang, Zhu (b36) 2023 Chen, Yoshioka, Lu, Zhou, Meng, Luo, Wu, Xiao, Li (b7) 2020 Chen, Mao, Liu (b6) 2020 Défossez, Synnaeve, Adi (b14) 2020 Fan, Liu, Tao, Yi, Wen (b16) 2019 Le Roux, Wisdom, Erdogan, Hershey (b26) 2019 Panayotov, Chen, Povey, Khudanpur (b39) 2015 Zhang, Chen, Chen, Liu, Hu, Chng (b58) 2024 Taal, Hendriks, Heusdens, Jensen (b49) 2010 Luo, Chen, Yoshioka (b31) 2020 Shi, Li, Toda (b44) 2024 Chen, Hou, Hu, Shirol, Chng (b3) 2022 Lu, Li, Song, Wang, Dang, Wang, Zhang (b30) 2023 Huang, Kim, Hasegawa-Johnson, Smaragdis (b22) 2015; 23 Luo, Mesgarani (b33) 2019; 27 Shi, Mimura, Kawahara (b45) 2024; 32 Hu, Hou, Chen, Chng (b20) 2022 Liu, Nie, Liang, Liu, Yu, Chen, Peng, Li (b28) 2019 Shi, Wang, Ge, Li, Dang (b47) 2020 Chen, Mao, Liu (b5) 2020 Cooke, Hershey, Rennie (b8) 2010; 24 Shi, Shimada, Hirano, Shibuya, Koyama, Zhong, Takahashi, Kawahara, Mitsufuji (b46) 2024 Wang, Chen (b53) 2018; 26 Nachmani, Adi, Wolf (b37) 2020 Wichern, Antognini, Flynn, Zhu, McQuinn, Crow, Manilow, Roux (b54) 2019 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b51) 2017; 30 Williamson, Wang, Wang (b55) 2015; 24 Pandey, Liu, Wang, Saraf (b40) 2021 Narayanan, Wang (b38) 2014; 22 Vincent, Gribonval, Févotte (b52) 2006; 14 Zhu, Zhang, Zhang, Dai (b59) 2023; 31 Carrasco, R.C., 2014. An open-source OCR evaluation tool. In: Proceedings of the First International Conference on Digital Access To Textual Cultural Heritage. pp. 179–184. Park, Kang, Shin, Kim, Han (b42) 2022 Cosentino, Pariente, Cornell, Deleforge, Vincent (b9) 2020 Hershey, Chen, Le Roux, Watanabe (b18) 2016 Baevski, Zhou, Mohamed, Auli (b1) 2020; 33 Hu, Chen, Zou, Zhong, Chng (b19) 2023 Series (b43) 2011 Erdogan, Hershey, Watanabe, Le Roux (b15) 2015 Luo, Mesgarani (b32) 2018 Tzinis, Wang, Smaragdis (b50) 2020 Dang, Matsumoto, Takeuchi, Kudo (b12) 2024 Huang, Kim, Hasegawa-Johnson, Smaragdis (b21) 2014 Fu, Liao, Tsao, Lin (b17) 2019 Zhu, Zhang, Zhang, Wu, Fang, Dai (b60) 2022 Kingma, D.P., Ba, J., 2015. Adam: A Method for Stochastic Optimization. In: 3rd International Conference on Learning Representations ICLR. Défossez (10.1016/j.specom.2024.103162_b14) 2020 Pariente (10.1016/j.specom.2024.103162_b41) 2020 Le Roux (10.1016/j.specom.2024.103162_b26) 2019 Wichern (10.1016/j.specom.2024.103162_b54) 2019 Vincent (10.1016/j.specom.2024.103162_b52) 2006; 14 Williamson (10.1016/j.specom.2024.103162_b55) 2015; 24 Chen (10.1016/j.specom.2024.103162_b3) 2022 Shi (10.1016/j.specom.2024.103162_b44) 2024 Shi (10.1016/j.specom.2024.103162_b47) 2020 Hui (10.1016/j.specom.2024.103162_b23) 2015 Hu (10.1016/j.specom.2024.103162_b20) 2022 Ma (10.1016/j.specom.2024.103162_b34) 2021 Huang (10.1016/j.specom.2024.103162_b22) 2015; 23 Chen (10.1016/j.specom.2024.103162_b6) 2020 Park (10.1016/j.specom.2024.103162_b42) 2022 Pandey (10.1016/j.specom.2024.103162_b40) 2021 Nachmani (10.1016/j.specom.2024.103162_b37) 2020 Zhu (10.1016/j.specom.2024.103162_b60) 2022 Fu (10.1016/j.specom.2024.103162_b17) 2019 Liu (10.1016/j.specom.2024.103162_b28) 2019 Liu (10.1016/j.specom.2024.103162_b29) 2023 10.1016/j.specom.2024.103162_b24 Dang (10.1016/j.specom.2024.103162_b11) 2023 Kolbæk (10.1016/j.specom.2024.103162_b25) 2017; 25 Luo (10.1016/j.specom.2024.103162_b31) 2020 Li (10.1016/j.specom.2024.103162_b27) 2021 Luo (10.1016/j.specom.2024.103162_b32) 2018 Cosentino (10.1016/j.specom.2024.103162_b9) 2020 Chen (10.1016/j.specom.2024.103162_b7) 2020 Panayotov (10.1016/j.specom.2024.103162_b39) 2015 Tzinis (10.1016/j.specom.2024.103162_b50) 2020 Das (10.1016/j.specom.2024.103162_b13) 2021; 24 Shi (10.1016/j.specom.2024.103162_b45) 2024; 32 Dang (10.1016/j.specom.2024.103162_b10) 2023 Luo (10.1016/j.specom.2024.103162_b33) 2019; 27 Chen (10.1016/j.specom.2024.103162_b4) 2017 Hu (10.1016/j.specom.2024.103162_b19) 2023 Mu (10.1016/j.specom.2024.103162_b36) 2023 Vaswani (10.1016/j.specom.2024.103162_b51) 2017; 30 Yu (10.1016/j.specom.2024.103162_b57) 2017 Zhu (10.1016/j.specom.2024.103162_b59) 2023; 31 Narayanan (10.1016/j.specom.2024.103162_b38) 2014; 22 Shi (10.1016/j.specom.2024.103162_b46) 2024 Baevski (10.1016/j.specom.2024.103162_b1) 2020; 33 Taal (10.1016/j.specom.2024.103162_b49) 2010 Wisdom (10.1016/j.specom.2024.103162_b56) 2020; 33 Zhang (10.1016/j.specom.2024.103162_b58) 2024 Huang (10.1016/j.specom.2024.103162_b21) 2014 Subakan (10.1016/j.specom.2024.103162_b48) 2021 10.1016/j.specom.2024.103162_b2 Fan (10.1016/j.specom.2024.103162_b16) 2019 Cooke (10.1016/j.specom.2024.103162_b8) 2010; 24 Lu (10.1016/j.specom.2024.103162_b30) 2023 Wang (10.1016/j.specom.2024.103162_b53) 2018; 26 Chen (10.1016/j.specom.2024.103162_b5) 2020 Maciejewski (10.1016/j.specom.2024.103162_b35) 2020 Dang (10.1016/j.specom.2024.103162_b12) 2024 Erdogan (10.1016/j.specom.2024.103162_b15) 2015 Hershey (10.1016/j.specom.2024.103162_b18) 2016 Series (10.1016/j.specom.2024.103162_b43) 2011 |
| References_xml | – start-page: 2637 year: 2020 end-page: 2641 ident: b41 article-title: Asteroid: The PyTorch-based audio source separation toolkit for researchers publication-title: Annual Conference of the International Speech Communication Association – volume: 30 start-page: 5998 year: 2017 end-page: 6008 ident: b51 article-title: Attention is all you need publication-title: Adv. Neural Inf. Process. Syst. – start-page: 4599 year: 2019 end-page: 4603 ident: b16 article-title: Discriminative learning for monaural speech separation using deep embedding features publication-title: Annual Conference of the International Speech Communication Association – start-page: 2627 year: 2020 end-page: 2631 ident: b6 article-title: On synthesis for supervised monaural speech separation in time domain publication-title: Annual Conference of the International Speech Communication Association – reference: Kingma, D.P., Ba, J., 2015. Adam: A Method for Stochastic Optimization. In: 3rd International Conference on Learning Representations ICLR. – volume: 24 start-page: 1 year: 2010 end-page: 15 ident: b8 article-title: Monaural speech separation and recognition challenge publication-title: Comput. Speech Lang. – start-page: 2642 year: 2020 end-page: 2646 ident: b5 article-title: Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation publication-title: Annual Conference of the International Speech Communication Association – volume: 33 start-page: 3846 year: 2020 end-page: 3857 ident: b56 article-title: Unsupervised sound separation using mixture invariant training publication-title: Adv. Neural Inf. Process. Syst. – start-page: 3759 year: 2023 end-page: 3763 ident: b11 article-title: Using semi-supervised learning for monaural time-domain speech separation with a self-supervised learning-based SI-SNR estimator publication-title: Annual Conference of the International Speech Communication Association – volume: 24 start-page: 883 year: 2021 end-page: 901 ident: b13 article-title: Fundamentals, present and future perspectives of speech enhancement publication-title: Int. J. Speech Technol. – start-page: 497 year: 2021 end-page: 502 ident: b34 article-title: Multitask-based joint learning approach to robust ASR for radio communication speech publication-title: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference – start-page: 46 year: 2020 end-page: 50 ident: b31 article-title: Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 223 year: 2021 end-page: 228 ident: b40 article-title: Dual application of speech enhancement for automatic speech recognition publication-title: IEEE Spoken Language Technology Workshop SLT – start-page: 626 year: 2019 end-page: 630 ident: b26 article-title: SDR - Half-baked or well done? publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 1893 year: 2023 end-page: 1897 ident: b29 article-title: Multi-level knowledge distillation for speech emotion recognition in noisy conditions publication-title: Annual Conference of the International Speech Communication Association – start-page: 4214 year: 2010 end-page: 4217 ident: b49 article-title: A short-time objective intelligibility measure for time-frequency weighted noisy speech publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 1 year: 2020 end-page: 6 ident: b50 article-title: Sudo RM -RF: Efficient networks for universal audio source separation publication-title: 30th International Workshop on Machine Learning for Signal Processing – start-page: 375 year: 2023 end-page: 380 ident: b10 article-title: Time-Domain Monaural Speech Separation of Introducing Discriminative Loss Between Speakers – start-page: 241 year: 2017 end-page: 245 ident: b57 article-title: Permutation invariant training of deep models for speaker-independent multi-talker speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 23 start-page: 2136 year: 2015 end-page: 2147 ident: b22 article-title: Joint optimization of masks and deep recurrent neural networks for monaural source separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 1 year: 2023 end-page: 5 ident: b30 article-title: Speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 3174 year: 2022 end-page: 3178 ident: b60 article-title: A noise-robust self-supervised pre-training model based speech representation learning for automatic speech recognition publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 1368 year: 2019 end-page: 1372 ident: b54 article-title: WHAM!: Extending speech separation to noisy environments publication-title: Annual Conference of the International Speech Communication Association – start-page: 12511 year: 2024 end-page: 12515 ident: b12 article-title: A separation priority pipeline for single-channel speech separation in noisy environments publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 2024 year: 2024 end-page: 2350 ident: b44 article-title: Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition publication-title: Annual Conference of the International Speech Communication Association – start-page: 708 year: 2015 end-page: 712 ident: b15 article-title: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 21 year: 2021 end-page: 25 ident: b48 article-title: Attention is all you need in speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 22 start-page: 826 year: 2014 end-page: 835 ident: b38 article-title: Investigation of speech separation as a front-end for noise robust speech recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 31 year: 2016 end-page: 35 ident: b18 article-title: Deep clustering: Discriminative embeddings for segmentation and separation publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 7544 year: 2020 end-page: 7548 ident: b47 article-title: Spectrograms fusion with minimum difference masks estimation for monaural speech dereverberation publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 12951 year: 2024 end-page: 12955 ident: b46 article-title: Diffusion-based speech enhancement with joint generative and predictive decoders publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 1 year: 2023 end-page: 5 ident: b36 article-title: A multi-stage triple-path method for speech separation in noisy and reverberant environments publication-title: International Conference on Acoustics, Speech and Signal Processing – year: 2020 ident: b9 article-title: LibriMix: An open-source dataset for generalizable speech separation – start-page: 7284 year: 2020 end-page: 7288 ident: b7 article-title: Continuous speech separation: Dataset and analysis publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 26 start-page: 1702 year: 2018 end-page: 1726 ident: b53 article-title: Supervised speech separation based on deep learning: An overview publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – volume: 24 start-page: 483 year: 2015 end-page: 492 ident: b55 article-title: Complex ratio masking for monaural speech separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 865 year: 2021 end-page: 872 ident: b27 article-title: Dual-path RNN for long recording speech separation publication-title: IEEE Spoken Language Technology Workshop – start-page: 4298 year: 2022 end-page: 4302 ident: b3 article-title: Noise-robust speech recognition with 10 minutes unparalleled in-domain data publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 7842 year: 2022 end-page: 7846 ident: b42 article-title: Manner: Multi-view attention network for noise erasure publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 25 start-page: 1901 year: 2017 end-page: 1913 ident: b25 article-title: Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 1381 year: 2024 end-page: 1385 ident: b58 article-title: Noise-aware speech separation with contrastive learning publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 27 start-page: 1256 year: 2019 end-page: 1266 ident: b33 article-title: Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 1 year: 2023 end-page: 5 ident: b19 article-title: Unifying speech enhancement and separation with gradient modulation for end-to-end noise-robust speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – reference: Carrasco, R.C., 2014. An open-source OCR evaluation tool. In: Proceedings of the First International Conference on Digital Access To Textual Cultural Heritage. pp. 179–184. – start-page: 6292 year: 2022 end-page: 6296 ident: b20 article-title: Interactive feature fusion for end-to-end noise-robust speech recognition publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 3291 year: 2020 end-page: 3295 ident: b14 article-title: Real time speech enhancement in the waveform domain publication-title: Annual Conference of the International Speech Communication Association – start-page: 491 year: 2019 end-page: 495 ident: b28 article-title: Jointly adversarial enhancement training for robust end-to-end speech recognition publication-title: Annual Conference of the International Speech Communication Association – volume: 14 start-page: 1462 year: 2006 end-page: 1469 ident: b52 article-title: Performance measurement in blind audio source separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – volume: 33 start-page: 12449 year: 2020 end-page: 12460 ident: b1 article-title: Wav2vec 2.0: A framework for self-supervised learning of speech representations publication-title: Adv. Neural Inf. Process. Syst. – start-page: 246 year: 2017 end-page: 250 ident: b4 article-title: Deep attractor network for single-microphone speaker separation publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 1562 year: 2014 end-page: 1566 ident: b21 article-title: Deep learning for monaural speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 31 start-page: 1927 year: 2023 end-page: 1939 ident: b59 article-title: A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 696 year: 2020 end-page: 700 ident: b35 article-title: WHAMR!: Noisy and reverberant single-channel speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – start-page: 2031 year: 2019 end-page: 2041 ident: b17 article-title: MetricGAN: Generative adversarial networks based black-box metric scores optimization for speech enhancement publication-title: International Conference on Machine Learning – start-page: 7164 year: 2020 end-page: 7175 ident: b37 article-title: Voice separation with an unknown number of multiple speakers publication-title: International Conference on Machine Learning – start-page: 24 year: 2015 end-page: 27 ident: b23 article-title: Convolutional maxout neural networks for speech separation publication-title: IEEE International Symposium on Signal Processing and Information Technology – start-page: 5206 year: 2015 end-page: 5210 ident: b39 article-title: LibriSpeech: an ASR corpus based on public domain audio books publication-title: International Conference on Acoustics, Speech and Signal Processing – year: 2011 ident: b43 article-title: Algorithms to measure audio programme loudness and true-peak audio level – start-page: 696 year: 2018 end-page: 700 ident: b32 article-title: TasNet: time-domain audio separation network for real-time, single-channel speech separation publication-title: International Conference on Acoustics, Speech and Signal Processing – volume: 32 start-page: 3049 year: 2024 end-page: 3060 ident: b45 article-title: Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 7164 year: 2020 ident: 10.1016/j.specom.2024.103162_b37 article-title: Voice separation with an unknown number of multiple speakers – volume: 14 start-page: 1462 issue: 4 year: 2006 ident: 10.1016/j.specom.2024.103162_b52 article-title: Performance measurement in blind audio source separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TSA.2005.858005 – volume: 30 start-page: 5998 year: 2017 ident: 10.1016/j.specom.2024.103162_b51 article-title: Attention is all you need publication-title: Adv. Neural Inf. Process. Syst. – start-page: 46 year: 2020 ident: 10.1016/j.specom.2024.103162_b31 article-title: Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation – start-page: 5206 year: 2015 ident: 10.1016/j.specom.2024.103162_b39 article-title: LibriSpeech: an ASR corpus based on public domain audio books – start-page: 1 year: 2023 ident: 10.1016/j.specom.2024.103162_b30 article-title: Speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition – start-page: 241 year: 2017 ident: 10.1016/j.specom.2024.103162_b57 article-title: Permutation invariant training of deep models for speaker-independent multi-talker speech separation – start-page: 1368 year: 2019 ident: 10.1016/j.specom.2024.103162_b54 article-title: WHAM!: Extending speech separation to noisy environments – start-page: 6292 year: 2022 ident: 10.1016/j.specom.2024.103162_b20 article-title: Interactive feature fusion for end-to-end noise-robust speech recognition – start-page: 375 year: 2023 ident: 10.1016/j.specom.2024.103162_b10 – start-page: 223 year: 2021 ident: 10.1016/j.specom.2024.103162_b40 article-title: Dual application of speech enhancement for automatic speech recognition – volume: 24 start-page: 1 issue: 1 year: 2010 ident: 10.1016/j.specom.2024.103162_b8 article-title: Monaural speech separation and recognition challenge publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2009.02.006 – start-page: 1893 year: 2023 ident: 10.1016/j.specom.2024.103162_b29 article-title: Multi-level knowledge distillation for speech emotion recognition in noisy conditions – volume: 24 start-page: 883 year: 2021 ident: 10.1016/j.specom.2024.103162_b13 article-title: Fundamentals, present and future perspectives of speech enhancement publication-title: Int. J. Speech Technol. doi: 10.1007/s10772-020-09674-2 – start-page: 3759 year: 2023 ident: 10.1016/j.specom.2024.103162_b11 article-title: Using semi-supervised learning for monaural time-domain speech separation with a self-supervised learning-based SI-SNR estimator – start-page: 4298 year: 2022 ident: 10.1016/j.specom.2024.103162_b3 article-title: Noise-robust speech recognition with 10 minutes unparalleled in-domain data – start-page: 1381 year: 2024 ident: 10.1016/j.specom.2024.103162_b58 article-title: Noise-aware speech separation with contrastive learning – volume: 24 start-page: 483 issue: 3 year: 2015 ident: 10.1016/j.specom.2024.103162_b55 article-title: Complex ratio masking for monaural speech separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2015.2512042 – start-page: 21 year: 2021 ident: 10.1016/j.specom.2024.103162_b48 article-title: Attention is all you need in speech separation – volume: 31 start-page: 1927 year: 2023 ident: 10.1016/j.specom.2024.103162_b59 article-title: A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2023.3275033 – volume: 27 start-page: 1256 issue: 8 year: 2019 ident: 10.1016/j.specom.2024.103162_b33 article-title: Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2019.2915167 – start-page: 2637 year: 2020 ident: 10.1016/j.specom.2024.103162_b41 article-title: Asteroid: The PyTorch-based audio source separation toolkit for researchers – start-page: 12951 year: 2024 ident: 10.1016/j.specom.2024.103162_b46 article-title: Diffusion-based speech enhancement with joint generative and predictive decoders – start-page: 497 year: 2021 ident: 10.1016/j.specom.2024.103162_b34 article-title: Multitask-based joint learning approach to robust ASR for radio communication speech – volume: 23 start-page: 2136 issue: 12 year: 2015 ident: 10.1016/j.specom.2024.103162_b22 article-title: Joint optimization of masks and deep recurrent neural networks for monaural source separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2015.2468583 – start-page: 696 year: 2020 ident: 10.1016/j.specom.2024.103162_b35 article-title: WHAMR!: Noisy and reverberant single-channel speech separation – start-page: 2642 year: 2020 ident: 10.1016/j.specom.2024.103162_b5 article-title: Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation – start-page: 708 year: 2015 ident: 10.1016/j.specom.2024.103162_b15 article-title: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks – start-page: 626 year: 2019 ident: 10.1016/j.specom.2024.103162_b26 article-title: SDR - Half-baked or well done? – volume: 32 start-page: 3049 year: 2024 ident: 10.1016/j.specom.2024.103162_b45 article-title: Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2024.3407511 – start-page: 2024 year: 2024 ident: 10.1016/j.specom.2024.103162_b44 article-title: Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition – ident: 10.1016/j.specom.2024.103162_b24 – volume: 26 start-page: 1702 issue: 10 year: 2018 ident: 10.1016/j.specom.2024.103162_b53 article-title: Supervised speech separation based on deep learning: An overview publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2018.2842159 – start-page: 12511 year: 2024 ident: 10.1016/j.specom.2024.103162_b12 article-title: A separation priority pipeline for single-channel speech separation in noisy environments – start-page: 1 year: 2020 ident: 10.1016/j.specom.2024.103162_b50 article-title: Sudo RM -RF: Efficient networks for universal audio source separation – start-page: 696 year: 2018 ident: 10.1016/j.specom.2024.103162_b32 article-title: TasNet: time-domain audio separation network for real-time, single-channel speech separation – start-page: 7842 year: 2022 ident: 10.1016/j.specom.2024.103162_b42 article-title: Manner: Multi-view attention network for noise erasure – start-page: 1 year: 2023 ident: 10.1016/j.specom.2024.103162_b19 article-title: Unifying speech enhancement and separation with gradient modulation for end-to-end noise-robust speech separation – volume: 33 start-page: 3846 year: 2020 ident: 10.1016/j.specom.2024.103162_b56 article-title: Unsupervised sound separation using mixture invariant training publication-title: Adv. Neural Inf. Process. Syst. – year: 2020 ident: 10.1016/j.specom.2024.103162_b9 – year: 2011 ident: 10.1016/j.specom.2024.103162_b43 – start-page: 3174 year: 2022 ident: 10.1016/j.specom.2024.103162_b60 article-title: A noise-robust self-supervised pre-training model based speech representation learning for automatic speech recognition – start-page: 491 year: 2019 ident: 10.1016/j.specom.2024.103162_b28 article-title: Jointly adversarial enhancement training for robust end-to-end speech recognition – start-page: 1562 year: 2014 ident: 10.1016/j.specom.2024.103162_b21 article-title: Deep learning for monaural speech separation – start-page: 24 year: 2015 ident: 10.1016/j.specom.2024.103162_b23 article-title: Convolutional maxout neural networks for speech separation – start-page: 1 year: 2023 ident: 10.1016/j.specom.2024.103162_b36 article-title: A multi-stage triple-path method for speech separation in noisy and reverberant environments – start-page: 7544 year: 2020 ident: 10.1016/j.specom.2024.103162_b47 article-title: Spectrograms fusion with minimum difference masks estimation for monaural speech dereverberation – ident: 10.1016/j.specom.2024.103162_b2 doi: 10.1145/2595188.2595221 – start-page: 246 year: 2017 ident: 10.1016/j.specom.2024.103162_b4 article-title: Deep attractor network for single-microphone speaker separation – start-page: 2627 year: 2020 ident: 10.1016/j.specom.2024.103162_b6 article-title: On synthesis for supervised monaural speech separation in time domain – volume: 25 start-page: 1901 issue: 10 year: 2017 ident: 10.1016/j.specom.2024.103162_b25 article-title: Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2017.2726762 – start-page: 7284 year: 2020 ident: 10.1016/j.specom.2024.103162_b7 article-title: Continuous speech separation: Dataset and analysis – volume: 22 start-page: 826 issue: 4 year: 2014 ident: 10.1016/j.specom.2024.103162_b38 article-title: Investigation of speech separation as a front-end for noise robust speech recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2014.2305833 – start-page: 4214 year: 2010 ident: 10.1016/j.specom.2024.103162_b49 article-title: A short-time objective intelligibility measure for time-frequency weighted noisy speech – start-page: 4599 year: 2019 ident: 10.1016/j.specom.2024.103162_b16 article-title: Discriminative learning for monaural speech separation using deep embedding features – start-page: 865 year: 2021 ident: 10.1016/j.specom.2024.103162_b27 article-title: Dual-path RNN for long recording speech separation – start-page: 3291 year: 2020 ident: 10.1016/j.specom.2024.103162_b14 article-title: Real time speech enhancement in the waveform domain – start-page: 2031 year: 2019 ident: 10.1016/j.specom.2024.103162_b17 article-title: MetricGAN: Generative adversarial networks based black-box metric scores optimization for speech enhancement – start-page: 31 year: 2016 ident: 10.1016/j.specom.2024.103162_b18 article-title: Deep clustering: Discriminative embeddings for segmentation and separation – volume: 33 start-page: 12449 year: 2020 ident: 10.1016/j.specom.2024.103162_b1 article-title: Wav2vec 2.0: A framework for self-supervised learning of speech representations publication-title: Adv. Neural Inf. Process. Syst. |
| SSID | ssj0004882 |
| Score | 2.428249 |
| Snippet | This research presents a comprehensive investigation and comparison of noisy speech separation methods using multi-task learning. First, we categorize all... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 103162 |
| SubjectTerms | Multi-task learning Separation priority pipeline Speech enhancement Speech separation Supervised learning |
| Title | A comprehensive study on supervised single-channel noisy speech separation with multi-task learning |
| URI | https://dx.doi.org/10.1016/j.specom.2024.103162 |
| Volume | 167 |
| WOSCitedRecordID | wos001393996400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0167-6393 databaseCode: AIEXJ dateStart: 20220201 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0004882 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0167-6393 databaseCode: AIEXJ dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0004882 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ba9swFBZbuoe-jC27tF039DD2YjQa24rkxzA6uhHKoBnLm1EkuUkTrGC5I_n3PbJ8ScgY22AvxhaWEvx9HH0-PheE3nMleEwlI5JLQeI4C0kiYkFUmCmaCBkPZNW1ZMyur_l0mnyrv5jaqp0Ay3O-2STr_wo1jAHYLnX2L-BuF4UBOAfQ4Qiww_GPgB9VYeKFnteh6daXjc4De792hsGCxHQOgpUmLus316sgNwu7DexaazkPrPblwI1P8PYhh6QUdtm0mLjdVbQ3fpbcTTTp3N-1M3ouzAZ4eNv5v0t4JKZq4RRMNFxs2-1hIpbadWipNgfjPGSmWHRfnFQ152pRGNC-uz6LkDZhzo0j7SCZxvs2wWaDYIr2jLNv1nFg6L3P4e6jy0c1rqJAGLv6AYPatO-X0L5xS7uVXcAsvJRPH6OjkNGE99DR6Mvl9GuXScur9mLtX2mSLauIwMPf-rWY2REok2foaf1mgUeeEc_RI5330etx7Y-2-AMetyW0bR8dt1vfto_OfZY2_qFXmSg03NsMmGL5AskR3uMUrjiFTY47TuF9TuGKU9hzCnecwo5TuOMUbjj1En3_fDn5dEXq7hxEwqZQEp7QSCVyKIYXWif0QmVMSlePUuuMxypibMZnPGTZgCkX45zFNBpSpuCmCAx_Fr1Cvdzk-gThRGmqFQNlm7F4RiVIdgW7n2CgP1mo5CkizWNO174IS9pEJ96lHpbUwZJ6WE4Ra7BIayHpBWIK9PntzLN_nvkGHXdMP0e9srjXb9ET-bNc2OJdzbMHr9-gPw |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+comprehensive+study+on+supervised+single-channel+noisy+speech+separation+with+multi-task+learning&rft.jtitle=Speech+communication&rft.au=Dang%2C+Shaoxiang&rft.au=Matsumoto%2C+Tetsuya&rft.au=Takeuchi%2C+Yoshinori&rft.au=Kudo%2C+Hiroaki&rft.date=2025-02-01&rft.pub=Elsevier+B.V&rft.issn=0167-6393&rft.volume=167&rft_id=info:doi/10.1016%2Fj.specom.2024.103162&rft.externalDocID=S016763932400133X |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-6393&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-6393&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-6393&client=summon |