Time-domain Separation Priority Pipeline-based Cascaded Multi-task Learning for Monaural Noisy and Reverberant Speech Separation

Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams. This problem is particularly challenging in noisy and reverberant environments where the target information becomes obscured. Cascaded multi-ta...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:APSIPA transactions on signal and information processing Ročník 14; číslo 1
Hlavní autoři: Dang, Shaoxiang, Matsumoto, Tetsuya, Takeuchi, Yoshinori, Kudo, Hiroaki
Médium: Journal Article
Jazyk:angličtina
Vydáno: Hanover Now Publishers Inc 01.01.2025
Now Publishers
Témata:
ISSN:2048-7703, 2048-7703
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams. This problem is particularly challenging in noisy and reverberant environments where the target information becomes obscured. Cascaded multi-task learning breaks down complex tasks into simpler sub-tasks and leverages additional information for step-by-step learning, serving as an effective approach for integrating multiple objectives. However, its sequential nature often leads to over-suppression, degrading the performance of downstream modules. This article presents three main contributions. First, we propose a separation-priority pipeline to ensure that the critical separation sub-task is preserved against over-suppression. Second, to extract deeper multi-scale features, we design a consistent-stride deep encoder-decoder structure combined with depth-wise multi-receptive field fusion. Third, we advocate a training strategy that pre-trains each sub-task and applies time-varying and time-invariant weighted fine-tuning to further mitigate over-suppression. Our methods are evaluated on the open-source Libri2Mix and real-world LibriCSS datasets. Experimental results across diverse metrics demonstrate that all proposed innovations improve overall model performance.
AbstractList Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams. This problem is particularly challenging in noisy and reverberant environments where the target information becomes obscured. Cascaded multi-task learning breaks down complex tasks into simpler sub-tasks and leverages additional information for step-by-step learning, serving as an effective approach for integrating multiple objectives. However, its sequential nature often leads to over-suppression, degrading the performance of downstream modules. This article presents three main contributions. First, we propose a separation-priority pipeline to ensure that the critical separation sub-task is preserved against over-suppression. Second, to extract deeper multi-scale features, we design a consistent-stride deep encoder-decoder structure combined with depth-wise multi-receptive field fusion. Third, we advocate a training strategy that pre-trains each sub-task and applies time-varying and time-invariant weighted fine-tuning to further mitigate over-suppression. Our methods are evaluated on the open-source Libri2Mix and real-world LibriCSS datasets. Experimental results across diverse metrics demonstrate that all proposed innovations improve overall model performance.
Author Kudo, Hiroaki
Takeuchi, Yoshinori
Dang, Shaoxiang
Matsumoto, Tetsuya
Author_xml – sequence: 1
  givenname: Shaoxiang
  surname: Dang
  fullname: Dang, Shaoxiang
– sequence: 2
  givenname: Tetsuya
  surname: Matsumoto
  fullname: Matsumoto, Tetsuya
– sequence: 3
  givenname: Yoshinori
  surname: Takeuchi
  fullname: Takeuchi, Yoshinori
– sequence: 4
  givenname: Hiroaki
  surname: Kudo
  fullname: Kudo, Hiroaki
BookMark eNpNkU9v1DAQxSPUSpS2Nz6AJa6k9d_YPqIV0EpbqOjerYk9KV6ydrCzSHvrRyd0AfU0T6On92b0e9OcpJywad4yesVUx64Z66445YpSzl81Z5xK02pNxckL_bq5rHVLKWWMK9vJs-ZpE3fYhryDmMgDTlBgjjmR-xJzifOB3McJx5iw7aFiICuoHsIi7vbjHNsZ6g-yRigppkcy5ELucoJ9gZF8ybEeCKRAvuEvLD0WSDN5mBD99xdNF83pAGPFy7_zvNl8-rhZ3bTrr59vVx_WrRe8m9uAOgQAxgdmubV6kML3ygQfhEChB9opboTRsveg1IB0EFbL5UsttZJcnDe3x9iQYeumEndQDi5DdM-LXB4dlDn6EZ3x0pq-GwwzXFoNVnS9DT3nwULHg1my3h2zppJ_7rHObpv3JS3XO8GlVMwqThfX-6PLl1xrweF_K6PuDzG3EHP_iInfHDiJkw
ContentType Journal Article
Copyright Copyright Now Publishers Inc 2025
Copyright_xml – notice: Copyright Now Publishers Inc 2025
DBID AAYXX
CITATION
JQ2
DOA
DOI 10.1561/116.20250022
DatabaseName CrossRef
ProQuest Computer Science Collection
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2048-7703
ExternalDocumentID oai_doaj_org_article_8c498b6f8182497a936b9db22d9a62d8
10_1561_116_20250022
GroupedDBID .FH
5VS
74X
74Y
7~V
8FE
8FG
AABES
AABWE
AACJH
AAGFV
AAKTX
AARAB
AASVR
AAYXX
ABBXD
ABCFP
ABGDZ
ABKKG
ABMWE
ABQTM
ABROB
ABVKB
ACBMC
ACGFS
ACIMK
ACQPF
ACUIJ
ACZBM
ACZUX
ADBBV
ADCGK
ADFEC
ADOVH
AEBAK
AEHGV
AENGE
AFFUJ
AFKQG
AFLOS
AFLVW
AFUTZ
AGABE
AHQXX
AIGNW
AIHIV
AIOIP
AISIE
AJCYY
AJPFC
AJQAS
ALMA_UNASSIGNED_HOLDINGS
ALWZO
ARABE
AUXHV
BBLKV
BCNDV
BGHMG
BLZWO
BMAJL
BPHCQ
C0O
CBIIA
CCQAD
CFAFE
CHEAL
CITATION
DOHLZ
GROUPED_DOAJ
HG-
HZ~
I.6
IKXGN
IOEEP
IS6
I~P
J38
J3A
JHPGK
JQKCU
K6V
KCGVB
KFECR
KQ8
M-V
M48
M~E
NIKVX
NOJ
NOT
O9-
OK1
P62
PQQKQ
PROAC
PYCCK
RAMDC
RCA
RNS
S6-
S6U
SAAAG
T9M
UT1
WFFJZ
ZYDXJ
JQ2
ID FETCH-LOGICAL-c326t-de7ddaa12f192997f43cb58dcd33e37f065283874bca55fe0f39741257475423
IEDL.DBID DOA
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001560614700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2048-7703
IngestDate Fri Oct 03 12:52:21 EDT 2025
Sat Nov 01 15:03:12 EDT 2025
Sat Nov 29 07:36:04 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c326t-de7ddaa12f192997f43cb58dcd33e37f065283874bca55fe0f39741257475423
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://doaj.org/article/8c498b6f8182497a936b9db22d9a62d8
PQID 3244519520
PQPubID 2046284
ParticipantIDs doaj_primary_oai_doaj_org_article_8c498b6f8182497a936b9db22d9a62d8
proquest_journals_3244519520
crossref_primary_10_1561_116_20250022
PublicationCentury 2000
PublicationDate 2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Hanover
PublicationPlace_xml – name: Hanover
PublicationTitle APSIPA transactions on signal and information processing
PublicationYear 2025
Publisher Now Publishers Inc
Now Publishers
Publisher_xml – name: Now Publishers Inc
– name: Now Publishers
SSID ssj0001125964
Score 2.2947989
Snippet Monaural speech separation is a crucial task in speech processing, focused on isolating single-channel audio with multiple speakers into individual streams....
SourceID doaj
proquest
crossref
SourceType Open Website
Aggregation Database
Index Database
SubjectTerms Encoders-Decoders
Learning
Separation
Speech processing
Task complexity
Title Time-domain Separation Priority Pipeline-based Cascaded Multi-task Learning for Monaural Noisy and Reverberant Speech Separation
URI https://www.proquest.com/docview/3244519520
https://doaj.org/article/8c498b6f8182497a936b9db22d9a62d8
Volume 14
WOSCitedRecordID wos001560614700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2048-7703
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001125964
  issn: 2048-7703
  databaseCode: DOA
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2048-7703
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001125964
  issn: 2048-7703
  databaseCode: M~E
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELYQYoAB8RSFgjzAaLXOw4lHqIpYqCraoVvkxwUqRFI1AYkF8dM5OykqYmBhSIYokq07O993zt13hFy6lEUjuWUQGfebMeRMaqFYLPLAJHhpHflmE8lolM5mcrzW6svlhDXywI3heqmJZKpFjsCCkUKiZCi0tDoIrFQisL7Mt5_ItWDKn64gbksRtZnuyBF6nLuEBAT8fhD8wCAv1f_rS-zh5XaP7La8kF4389knG1AckJ01tcBD8unKNZgtXzCYpxNoRLvLgo6X89K1oKPj-cJVlwNz0GTpQFUu-d1SX2TLalU901ZO9ZEiV6W4n5VT3aCjcl69U1VY-gC4tDUggNV0sgAwT2sjHZHp7XA6uGNtAwVmkJXVzEJirVI8yJHHSZnkUWh0nFpjwxDCJEf6gewiTSJtVBzn0M-RnURoOowxYuRZx2SzKAs4IZQLMOhU5GspREoia-EchIQYDLdhrDrkamXRbNHIZGQuvEDLY4whspXlO-TGmfv7HSdu7R-gy7PW5dlfLu-Q7spZWbvjqgyJoVfKCfqn_zHGGdl2c24OW7pks16-wjnZMm_1vFpe-MWG9_uP4ReOVdnb
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Time-domain+Separation+Priority+Pipeline-based+Cascaded+Multi-task+Learning+for+Monaural+Noisy+and+Reverberant+Speech+Separation&rft.jtitle=APSIPA+transactions+on+signal+and+information+processing&rft.au=Dang%2C+Shaoxiang&rft.au=Matsumoto%2C+Tetsuya&rft.au=Takeuchi%2C+Yoshinori&rft.au=Kudo%2C+Hiroaki&rft.date=2025-01-01&rft.pub=Now+Publishers+Inc&rft.eissn=2048-7703&rft.volume=14&rft.issue=1&rft_id=info:doi/10.1561%2F116.20250022&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2048-7703&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2048-7703&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2048-7703&client=summon