Multi-task Piano Transcription with Local Relative Time Attention

Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online) s. 966 - 971
Hlavní autori: Wang, Qi, Liu, Mingkuan, Chen, Xianhong, Xiong, Mengwen
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 31.10.2023
Predmet:
ISSN:2640-0103
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%).
AbstractList Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%).
Author Wang, Qi
Liu, Mingkuan
Chen, Xianhong
Xiong, Mengwen
Author_xml – sequence: 1
  givenname: Qi
  surname: Wang
  fullname: Wang, Qi
  email: wangqi91@bjut.edu.cn
  organization: Beijing University of Technology,China
– sequence: 2
  givenname: Mingkuan
  surname: Liu
  fullname: Liu, Mingkuan
  email: chenxianhong@bjut.edu.cn
  organization: Beijing University of Technology,China
– sequence: 3
  givenname: Xianhong
  surname: Chen
  fullname: Chen, Xianhong
  email: wenmeng.xiong@bjut.edu.cn
  organization: Beijing University of Technology,China
– sequence: 4
  givenname: Mengwen
  surname: Xiong
  fullname: Xiong, Mengwen
  email: liumkuan@emails.bjut.edu.cn
  organization: Beijing University of Technology,China
BookMark eNo1j0FLwzAYQKMoOGf_gYccvLZ-ydckzbEUnYOKxdXzyLJvGO3a0UbFf6-int7l8eCds5N-6ImxKwGZEGCvy2a1bMpyValCCZNJkJgJQGEE5EcsscYWqAABtMFjNpM6hxS-hTOWTNMLAKAEzC3MWHn_1sWQRje98ia4fuDt6PrJj-EQw9DzjxCfeT141_FH6lwM78TbsCdexkj9j3LBTneumyj545w93d601V1aPyyWVVmnQUIeU9wq8KjURgMIzPVOOzLSe2WcNFZrJG9IkABPzmxIGOWs23rMDRXkyeKcXf52AxGtD2PYu_Fz_X-NX9QdTyk
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/APSIPAASC58517.2023.10317104
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Music
EISBN 9798350300673
EISSN 2640-0103
EndPage 971
ExternalDocumentID 10317104
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i204t-3d50c355b6001346f6ae72cc57a279663ec7e1e10cea7be175a9adc347e8ece93
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001108741800150&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:23:29 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-3d50c355b6001346f6ae72cc57a279663ec7e1e10cea7be175a9adc347e8ece93
PageCount 6
ParticipantIDs ieee_primary_10317104
PublicationCentury 2000
PublicationDate 2023-Oct.-31
PublicationDateYYYYMMDD 2023-10-31
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-Oct.-31
  day: 31
PublicationDecade 2020
PublicationTitle Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online)
PublicationTitleAbbrev APSIPA ASC
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003203490
Score 1.8566489
Snippet Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems...
SourceID ieee
SourceType Publisher
StartPage 966
SubjectTerms Aggregates
Asia
Estimation
Information processing
Measurement
Music
Symbols
Title Multi-task Piano Transcription with Local Relative Time Attention
URI https://ieeexplore.ieee.org/document/10317104
WOSCitedRecordID wos001108741800150&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1sFdGLWit-k0Ov22aTdNMcl2JRkLJQhd5KOpmFIrTSbvv73WTbqgcP3kIOIUzCvMlk3jyAFkrX1cJSRLGn5BguI6MxiVApZ3rOIHcuiE3o4bA3HptsS1YPXBgiCsVn1PbD8JfvFrj2qbKOlyQoEVHVoKZ1UpG19gkVKXyrFX4MrW0fzU6ajV6yNB31_deXbnuh8PZuiV9iKgFLBmf_3MU5NL9ZeSzb480FHNC8Aac_Ggo24DDINl9CGni1UWFXHywrL8CCBUzaeQjms6_s1cMYq6rhNsQ8GYSlRVHVPzbhffD01n-OtmIJ0UxwVUSlzTmWwcPURzBSJXliSQvErrZCl28aSagpppgjWT2lMmqwxjqUSlOPkIy8gvp8MadrYMinjqNxxuVWJYk2Inca45xMFzF34gaa3iqTz6ofxmRnkNs_5u_gxNu-8vj3UC-Wa3qAI9wUs9XyMZziFwbtnW4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0oGD8uKmL8tgeuC2W3u6XHDZFARLIJmHAjZTqbEBMwsPD73XYB9eDBW9ND00ybedPpvHkANQxMKH1NHjUtJUfxwFMSIw-FMKplFHJjnNiEHAxa47FKtmR1x4UhIld8RnU7dH_5ZoFrmyprWEmCHBHFIZRDIXxe0LX2KZXAt81W-DHUtp00G3Ey7CVxPGzbzy9Zt1Lh9d0iv-RUHJp0zv-5jwuofvPyWLJHnEs4oHkFzn60FKxA2Qk3X0HsmLVeplcfLMmvwII5VNr5CGbzr6xvgYwV9XAbYpYOwuIsKyogq_DeeRm1u95WLsGb-VxkXm51jnn4MLUxTCCiNNIkfcRQal_mr5qAUFKTmhxJyynlcYNW2mAgJLUISQXXUJov5nQDDPnUcFRGmVSLKJLKT43EZkoqREyNfwtVa5XJZ9ERY7IzyN0f889w0h299Sf93uD1Hk7tORT-_wFK2XJNj3CEm2y2Wj65E_0CaCOgtQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+...+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+APSIPA+ASC+...+%28Online%29&rft.atitle=Multi-task+Piano+Transcription+with+Local+Relative+Time+Attention&rft.au=Wang%2C+Qi&rft.au=Liu%2C+Mingkuan&rft.au=Chen%2C+Xianhong&rft.au=Xiong%2C+Mengwen&rft.date=2023-10-31&rft.pub=IEEE&rft.eissn=2640-0103&rft.spage=966&rft.epage=971&rft_id=info:doi/10.1109%2FAPSIPAASC58517.2023.10317104&rft.externalDocID=10317104