Multi-task Piano Transcription with Local Relative Time Attention

Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online) s. 966 - 971
Hlavní autoři:	Wang, Qi, Liu, Mingkuan, Chen, Xianhong, Xiong, Mengwen
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 31.10.2023
Témata:	Aggregates Asia Estimation Information processing Measurement Music Symbols
ISSN:	2640-0103
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%).
AbstractList	Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%).
Author	Wang, Qi Liu, Mingkuan Chen, Xianhong Xiong, Mengwen
Author_xml	– sequence: 1 givenname: Qi surname: Wang fullname: Wang, Qi email: wangqi91@bjut.edu.cn organization: Beijing University of Technology,China – sequence: 2 givenname: Mingkuan surname: Liu fullname: Liu, Mingkuan email: chenxianhong@bjut.edu.cn organization: Beijing University of Technology,China – sequence: 3 givenname: Xianhong surname: Chen fullname: Chen, Xianhong email: wenmeng.xiong@bjut.edu.cn organization: Beijing University of Technology,China – sequence: 4 givenname: Mengwen surname: Xiong fullname: Xiong, Mengwen email: liumkuan@emails.bjut.edu.cn organization: Beijing University of Technology,China
BookMark	eNo1j0FLwzAYQKMoOGf_gYccvLZ-ydckzbEUnYOKxdXzyLJvGO3a0UbFf6-int7l8eCds5N-6ImxKwGZEGCvy2a1bMpyValCCZNJkJgJQGEE5EcsscYWqAABtMFjNpM6hxS-hTOWTNMLAKAEzC3MWHn_1sWQRje98ia4fuDt6PrJj-EQw9DzjxCfeT141_FH6lwM78TbsCdexkj9j3LBTneumyj545w93d601V1aPyyWVVmnQUIeU9wq8KjURgMIzPVOOzLSe2WcNFZrJG9IkABPzmxIGOWs23rMDRXkyeKcXf52AxGtD2PYu_Fz_X-NX9QdTyk
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/APSIPAASC58517.2023.10317104
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Music
EISBN	9798350300673
EISSN	2640-0103
EndPage	971
ExternalDocumentID	10317104
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809
GroupedDBID	6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i204t-3d50c355b6001346f6ae72cc57a279663ec7e1e10cea7be175a9adc347e8ece93
IEDL.DBID	RIE
ISICitedReferencesCount	1
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001108741800150&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:23:29 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-3d50c355b6001346f6ae72cc57a279663ec7e1e10cea7be175a9adc347e8ece93
PageCount	6
ParticipantIDs	ieee_primary_10317104
PublicationCentury	2000
PublicationDate	2023-Oct.-31
PublicationDateYYYYMMDD	2023-10-31
PublicationDate_xml	– month: 10 year: 2023 text: 2023-Oct.-31 day: 31
PublicationDecade	2020
PublicationTitle	Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online)
PublicationTitleAbbrev	APSIPA ASC
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003203490
Score	1.8567535
Snippet	Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems...
SourceID	ieee
SourceType	Publisher
StartPage	966
SubjectTerms	Aggregates Asia Estimation Information processing Measurement Music Symbols
Title	Multi-task Piano Transcription with Local Relative Time Attention
URI	https://ieeexplore.ieee.org/document/10317104
WOSCitedRecordID	wos001108741800150&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1sFdGLWit-k0OvaXc32U32uBSLgpSFKvRWssksFKGVdtvfb5LtVj148BZyCGES5k0m8-YB9GRhYR8jSQuTMsplGVMZhoZimZpEBwUvROnFJsR4LKfTNN-R1T0XBhF98Rn23dD_5Zul3rhU2cBJElhE5C1oCZHUZK19QoVFrtVKcAy9XR_NQZZPXvIsmwzd15foO6HwfrPELzEVjyWjs3_u4hy636w8ku_x5gIOcNGB0x8NBTtw6GWbLyHzvFpaqfUHye0FWBKPSY2HIC77Sl4djJG6Gm6LxJFBSFZVdf1jF95HT2_DZ7oTS6DzKOAVZSYOtA0eChfBMJ6UiUIRaR0LFQn7pmGoBYYYBhqVKNBGDSpVRjMuUKLGlF1Be7Fc4DWQkGHEtSniUKXcPlQl18LFcVFij08qcwNdZ5XZZ90PY9YY5PaP-Ts4cbavPf49tKvVBh_gSG-r-Xr16E_xC-menEY
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0oGD8uKmL8tgeuhd1ud7t73BAJRCSbgAk30m2HhJiAgYXfb9sF1IMHb00PTdNp5k2n8-YBNOLcwD6ymOY6CSiPpyGNfV9TnCY6Ul7OczF1YhNiMIjH4yTbktUdFwYRXfEZNu3Q_eXrhVrbVFnLShIYROSHUA05Z15J19qnVAJmm614x9DYdtJspdmwl6XpsG0_v0TTSoU3d4v8klNxaNI5_-c-LqD-zcsj2R5xLuEA5zU4-9FSsAZVJ9x8Balj1tJCrj5IZq7AgjhU2vkIYvOvpG-BjJT1cBsklg5C0qIoKyDr8N55GbW7dCuXQGfM4wUNdOgpEz7kNoYJeDSNJAqmVCgkE-ZVE6AS6KPvKZQiRxM3yERqFXCBMSpMgmuozBdzvAHiB8i40nnoy4Sbp2rMlbCRHIuMAWOpb6FuT2XyWXbEmOwO5O6P-Wc46Y7e-pN-b_B6D6fWDqX_f4BKsVzjIxypTTFbLZ-cRb8APWufjQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+...+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+APSIPA+ASC+...+%28Online%29&rft.atitle=Multi-task+Piano+Transcription+with+Local+Relative+Time+Attention&rft.au=Wang%2C+Qi&rft.au=Liu%2C+Mingkuan&rft.au=Chen%2C+Xianhong&rft.au=Xiong%2C+Mengwen&rft.date=2023-10-31&rft.pub=IEEE&rft.eissn=2640-0103&rft.spage=966&rft.epage=971&rft_id=info:doi/10.1109%2FAPSIPAASC58517.2023.10317104&rft.externalDocID=10317104