Time-domain Separation Priority Pipeline-based Cascaded Multi-task Learning for Monaural Noisy and Reverberant Speech Separation
Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams. This problem is particularly challenging in noisy and reverberant environments where the target information becomes obscured. Cascaded multi-ta...
Uložené v:
| Vydané v: | APSIPA transactions on signal and information processing Ročník 14; číslo 1 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Hanover
Now Publishers Inc
01.01.2025
Now Publishers |
| Predmet: | |
| ISSN: | 2048-7703, 2048-7703 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams. This problem is particularly challenging in noisy and reverberant environments where the target information becomes obscured. Cascaded multi-task learning breaks down complex tasks into simpler sub-tasks and leverages additional information for step-by-step learning, serving as an effective approach for integrating multiple objectives. However, its sequential nature often leads to over-suppression, degrading the performance of downstream modules. This article presents three main contributions. First, we propose a separation-priority pipeline to ensure that the critical separation sub-task is preserved against over-suppression. Second, to extract deeper multi-scale features, we design a consistent-stride deep encoder-decoder structure combined with depth-wise multi-receptive field fusion. Third, we advocate a training strategy that pre-trains each sub-task and applies time-varying and time-invariant weighted fine-tuning to further mitigate over-suppression. Our methods are evaluated on the open-source Libri2Mix and real-world LibriCSS datasets. Experimental results across diverse metrics demonstrate that all proposed innovations improve overall model performance. |
|---|---|
| AbstractList | Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams. This problem is particularly challenging in noisy and reverberant environments where the target information becomes obscured. Cascaded multi-task learning breaks down complex tasks into simpler sub-tasks and leverages additional information for step-by-step learning, serving as an effective approach for integrating multiple objectives. However, its sequential nature often leads to over-suppression, degrading the performance of downstream modules. This article presents three main contributions. First, we propose a separation-priority pipeline to ensure that the critical separation sub-task is preserved against over-suppression. Second, to extract deeper multi-scale features, we design a consistent-stride deep encoder-decoder structure combined with depth-wise multi-receptive field fusion. Third, we advocate a training strategy that pre-trains each sub-task and applies time-varying and time-invariant weighted fine-tuning to further mitigate over-suppression. Our methods are evaluated on the open-source Libri2Mix and real-world LibriCSS datasets. Experimental results across diverse metrics demonstrate that all proposed innovations improve overall model performance. |
| Author | Kudo, Hiroaki Takeuchi, Yoshinori Dang, Shaoxiang Matsumoto, Tetsuya |
| Author_xml | – sequence: 1 givenname: Shaoxiang surname: Dang fullname: Dang, Shaoxiang – sequence: 2 givenname: Tetsuya surname: Matsumoto fullname: Matsumoto, Tetsuya – sequence: 3 givenname: Yoshinori surname: Takeuchi fullname: Takeuchi, Yoshinori – sequence: 4 givenname: Hiroaki surname: Kudo fullname: Kudo, Hiroaki |
| BookMark | eNpNkU9v1DAQxSPUSpS2Nz6AJa6k9d_YPqIV0EpbqOjerYk9KV6ydrCzSHvrRyd0AfU0T6On92b0e9OcpJywad4yesVUx64Z66445YpSzl81Z5xK02pNxckL_bq5rHVLKWWMK9vJs-ZpE3fYhryDmMgDTlBgjjmR-xJzifOB3McJx5iw7aFiICuoHsIi7vbjHNsZ6g-yRigppkcy5ELucoJ9gZF8ybEeCKRAvuEvLD0WSDN5mBD99xdNF83pAGPFy7_zvNl8-rhZ3bTrr59vVx_WrRe8m9uAOgQAxgdmubV6kML3ygQfhEChB9opboTRsveg1IB0EFbL5UsttZJcnDe3x9iQYeumEndQDi5DdM-LXB4dlDn6EZ3x0pq-GwwzXFoNVnS9DT3nwULHg1my3h2zppJ_7rHObpv3JS3XO8GlVMwqThfX-6PLl1xrweF_K6PuDzG3EHP_iInfHDiJkw |
| ContentType | Journal Article |
| Copyright | Copyright Now Publishers Inc 2025 |
| Copyright_xml | – notice: Copyright Now Publishers Inc 2025 |
| DBID | AAYXX CITATION JQ2 DOA |
| DOI | 10.1561/116.20250022 |
| DatabaseName | CrossRef ProQuest Computer Science Collection DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2048-7703 |
| ExternalDocumentID | oai_doaj_org_article_8c498b6f8182497a936b9db22d9a62d8 10_1561_116_20250022 |
| GroupedDBID | .FH 5VS 74X 74Y 7~V 8FE 8FG AABES AABWE AACJH AAGFV AAKTX AARAB AASVR AAYXX ABBXD ABCFP ABGDZ ABKKG ABMWE ABQTM ABROB ABVKB ACBMC ACGFS ACIMK ACQPF ACUIJ ACZBM ACZUX ADBBV ADCGK ADFEC ADOVH AEBAK AEHGV AENGE AFFUJ AFKQG AFLOS AFLVW AFUTZ AGABE AHQXX AIGNW AIHIV AIOIP AISIE AJCYY AJPFC AJQAS ALMA_UNASSIGNED_HOLDINGS ALWZO ARABE AUXHV BBLKV BCNDV BGHMG BLZWO BMAJL BPHCQ C0O CBIIA CCQAD CFAFE CHEAL CITATION DOHLZ GROUPED_DOAJ HG- HZ~ I.6 IKXGN IOEEP IS6 I~P J38 J3A JHPGK JQKCU K6V KCGVB KFECR KQ8 M-V M48 M~E NIKVX NOJ NOT O9- OK1 P62 PQQKQ PROAC PYCCK RAMDC RCA RNS S6- S6U SAAAG T9M UT1 WFFJZ ZYDXJ JQ2 |
| ID | FETCH-LOGICAL-c326t-de7ddaa12f192997f43cb58dcd33e37f065283874bca55fe0f39741257475423 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001560614700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2048-7703 |
| IngestDate | Fri Oct 03 12:52:21 EDT 2025 Sat Nov 01 15:03:12 EDT 2025 Sat Nov 29 07:36:04 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c326t-de7ddaa12f192997f43cb58dcd33e37f065283874bca55fe0f39741257475423 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://doaj.org/article/8c498b6f8182497a936b9db22d9a62d8 |
| PQID | 3244519520 |
| PQPubID | 2046284 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_8c498b6f8182497a936b9db22d9a62d8 proquest_journals_3244519520 crossref_primary_10_1561_116_20250022 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-01-01 |
| PublicationDateYYYYMMDD | 2025-01-01 |
| PublicationDate_xml | – month: 01 year: 2025 text: 2025-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Hanover |
| PublicationPlace_xml | – name: Hanover |
| PublicationTitle | APSIPA transactions on signal and information processing |
| PublicationYear | 2025 |
| Publisher | Now Publishers Inc Now Publishers |
| Publisher_xml | – name: Now Publishers Inc – name: Now Publishers |
| SSID | ssj0001125964 |
| Score | 2.2947989 |
| Snippet | Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams.... |
| SourceID | doaj proquest crossref |
| SourceType | Open Website Aggregation Database Index Database |
| SubjectTerms | Encoders-Decoders Learning Separation Speech processing Task complexity |
| Title | Time-domain Separation Priority Pipeline-based Cascaded Multi-task Learning for Monaural Noisy and Reverberant Speech Separation |
| URI | https://www.proquest.com/docview/3244519520 https://doaj.org/article/8c498b6f8182497a936b9db22d9a62d8 |
| Volume | 14 |
| WOSCitedRecordID | wos001560614700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2048-7703 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001125964 issn: 2048-7703 databaseCode: DOA dateStart: 20120101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2048-7703 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001125964 issn: 2048-7703 databaseCode: M~E dateStart: 20120101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA4yPOhB_InTKTnoMaxN0x856tjwoGO4IbuV_KoOsR1rFbyIf7ovaScVD1689FAKCe8l_b4vvPcFoQumlRYZB1miPUOYNcEULDCE-iJKrOGIcO1jD7fxeJzM53zSuurL1oTV9sB14PqJYjyRUQbAAkohFjyIJNeSUs1FRLVr8_Vi3hJT7nQFcJtHrKl0B47Q931bkACA71H6A4OcVf-vP7GDl9Eu2ml4Ib6q57OHNky-j7ZbboEH6NO2axBdvICYx1NTm3YXOZ6sFoW9gg5PFkvbXW6IhSaNB6K0xe8auyZbUonyGTd2qo8YuCqG_Sys6wYeF4vyHYtc43sDS1saALAKT5fGqKfWSIdoNhrOBjekuUCBKGBlFdEm1loIn2bA4ziPMxYoGSaQniAwQZwB_QB2kcRMKhGGmfEyYCcMQgcaIwSedYQ6eZGbY4QT4EUqkoZaO5mQK0mZyiiwsyA0OhCiiy7XEU2XtU1GauUFRB40RpSuI99F1zbc399Yc2v3AlKeNilP_0p5F_XWyUqbHVemQAydUw71Tv5jjFO0ZedcH7b0UKdavZoztKneqkW5OneLDZ53H8MvvdDZhA |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Time-domain+Separation+Priority+Pipeline-based+Cascaded+Multi-task+Learning+for+Monaural+Noisy+and+Reverberant+Speech+Separation&rft.jtitle=APSIPA+transactions+on+signal+and+information+processing&rft.au=Dang%2C+Shaoxiang&rft.au=Matsumoto%2C+Tetsuya&rft.au=Takeuchi%2C+Yoshinori&rft.au=Kudo%2C+Hiroaki&rft.date=2025-01-01&rft.pub=Now+Publishers+Inc&rft.eissn=2048-7703&rft.volume=14&rft.issue=1&rft_id=info:doi/10.1561%2F116.20250022&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2048-7703&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2048-7703&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2048-7703&client=summon |