Hybrid Transformers for Music Source Separation

A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based Transformers [1] have shown their ability to integrate information over long sequences. In this work...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 1 - 5
Hlavní autoři: Rouard, Simon, Massa, Francisco, Defossez, Alexandre
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 04.06.2023
Témata:
ISSN:2379-190X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based Transformers [1] have shown their ability to integrate information over long sequences. In this work, we introduce Hybrid Transformer Demucs (HT Demucs), an hybrid temporal/spectral bi-U-Net based on Hybrid Demucs [2], where the innermost layers are replaced by a cross-domain Transformer Encoder, using self-attention within one domain, and cross-attention across domains. While it performs poorly when trained only on MUSDB [3], we show that it outperforms Hybrid Demucs (trained on the same data) by 0.45 dB of SDR when using 800 extra training songs. Using sparse attention kernels to extend its receptive field, and per source fine-tuning, we achieve state-of-the-art results on MUSDB with extra training data, with 9.20 dB of SDR.
AbstractList A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based Transformers [1] have shown their ability to integrate information over long sequences. In this work, we introduce Hybrid Transformer Demucs (HT Demucs), an hybrid temporal/spectral bi-U-Net based on Hybrid Demucs [2], where the innermost layers are replaced by a cross-domain Transformer Encoder, using self-attention within one domain, and cross-attention across domains. While it performs poorly when trained only on MUSDB [3], we show that it outperforms Hybrid Demucs (trained on the same data) by 0.45 dB of SDR when using 800 extra training songs. Using sparse attention kernels to extend its receptive field, and per source fine-tuning, we achieve state-of-the-art results on MUSDB with extra training data, with 9.20 dB of SDR.
Author Rouard, Simon
Massa, Francisco
Defossez, Alexandre
Author_xml – sequence: 1
  givenname: Simon
  surname: Rouard
  fullname: Rouard, Simon
  organization: Meta AI
– sequence: 2
  givenname: Francisco
  surname: Massa
  fullname: Massa, Francisco
  organization: Meta AI
– sequence: 3
  givenname: Alexandre
  surname: Defossez
  fullname: Defossez, Alexandre
  organization: Meta AI
BookMark eNo1j8tKAzEUQKMo2Kn-gYvxA2Z6b26eSynaChWFqeCuZJoEInamJO2if29BXZ3d4ZyKXQ3jEBh7QGgRwc5e5o9d9y4sSd1y4NQigFVWqgtWoeYGFXGtL9mEk7YNWvi8YVUpXwBgtDATNlue-px8vc5uKHHMu5BLfWb9eixpW3fjMW9D3YW9y-6QxuGWXUf3XcLdH6fs4_lpPV82q7fFOWbVJDT20PhIoEn2UTsvQQmFwirnhIyKc68F9hgF7y0BERqDMqI3YGTQIkYET1N2_-tNIYTNPqedy6fN_x39AIhjRaY
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP49357.2023.10096956
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1728163277
9781728163277
EISSN 2379-190X
EndPage 5
ExternalDocumentID 10096956
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i189t-df30735bf7ad506461496aa45f622d741b1f42b9303318815f1d8085e74ff10d3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:23:37 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i189t-df30735bf7ad506461496aa45f622d741b1f42b9303318815f1d8085e74ff10d3
PageCount 5
ParticipantIDs ieee_primary_10096956
PublicationCentury 2000
PublicationDate 2023-June-4
PublicationDateYYYYMMDD 2023-06-04
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June-4
  day: 04
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.6306784
Snippet A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Acoustics
Multiple signal classification
Music Source Separation
Source separation
Speech processing
Training
Training data
Transformers
Title Hybrid Transformers for Music Source Separation
URI https://ieeexplore.ieee.org/document/10096956
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8MwGH5xQ0QvflX8JoLXdqZNm_YowzEvo9AJu42kSWCXVmYn7N_7Jm2nHjx4ahtaQhLevk8-nucBeFRcmdDIzC-liHxGmfBFgo_WPoQmduvPtGYTfDZLF4ss78jqjgujtXaHz3Rgb91evqrLjV0qwwhHwI2AfgADzpOWrLX77aacpQfw0Ilojl7Hz0WRsyyKeWAtwoP-4182Ki6LTI7_Wf8JeN98PJLvMs0p7OnqDI5-SAmew2i6tdwrMu-BKMI6glfijJxJ4dboSaFbqe-68uBt8jIfT_3ODMFf0TRrfGVsNMbScKGsyBym1SwRgsUmCUOFuEBSw0KZYUrCME1pbKhKEU9pzoyhTyq6gGFVV_oSiLESLLFUKlRW2SiRsZBhSali-CrOGK7As21fvrd6F8u-2dd_lN_Aoe1hd4CK3cKwWW_0HeyXn83qY33vRukLiE6RXg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8NAFH5oFZeLW8TdEbymddLJdpRiSbGWQir0VmYyM9BLKrUV_Pe-N0mrHjx4SjIkMAsv75vl-z6Aex1rG1iV-oWSbV9wIX0Z4SPZh_CItv5sZTYRDwbJeJwOa7K648IYY9zhM9OkW7eXr2fFkpbKMMIRcCOg34Qtss6q6VrrH28Si2QH7moZzVav85jnQ5G2w7hJJuHN1ee_jFRcHuke_LMGh-B9M_LYcJ1rjmDDlMew_0NM8ARa2Sexr9hoBUUR2DG8MmflzHK3Ss9yU4l9z0oPXrtPo07m13YI_pQn6cLXluIxVDaWmmTmMLGmkZQitFEQaEQGilsRqBSTEgZqwkPLdYKIysTCWv6g26fQKGelOQNmSYQlVFoHmrSNIhVKFRSca4Gv4pzhHDxq--StUryYrJp98Uf5Lexmo5f-pN8bPF_CHvW2O04lrqCxmC_NNWwXH4vp-_zGjdgXUF2Upw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Hybrid+Transformers+for+Music+Source+Separation&rft.au=Rouard%2C+Simon&rft.au=Massa%2C+Francisco&rft.au=Defossez%2C+Alexandre&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096956&rft.externalDocID=10096956