Multi-task Piano Transcription with Local Relative Time Attention
Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more...
Gespeichert in:
| Veröffentlicht in: | Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online) S. 966 - 971 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
31.10.2023
|
| Schlagworte: | |
| ISSN: | 2640-0103 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%). |
|---|---|
| AbstractList | Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%). |
| Author | Wang, Qi Liu, Mingkuan Chen, Xianhong Xiong, Mengwen |
| Author_xml | – sequence: 1 givenname: Qi surname: Wang fullname: Wang, Qi email: wangqi91@bjut.edu.cn organization: Beijing University of Technology,China – sequence: 2 givenname: Mingkuan surname: Liu fullname: Liu, Mingkuan email: chenxianhong@bjut.edu.cn organization: Beijing University of Technology,China – sequence: 3 givenname: Xianhong surname: Chen fullname: Chen, Xianhong email: wenmeng.xiong@bjut.edu.cn organization: Beijing University of Technology,China – sequence: 4 givenname: Mengwen surname: Xiong fullname: Xiong, Mengwen email: liumkuan@emails.bjut.edu.cn organization: Beijing University of Technology,China |
| BookMark | eNo1j0FLwzAYQKMoOGf_gYccvLZ-ydckzbEUnYOKxdXzyLJvGO3a0UbFf6-int7l8eCds5N-6ImxKwGZEGCvy2a1bMpyValCCZNJkJgJQGEE5EcsscYWqAABtMFjNpM6hxS-hTOWTNMLAKAEzC3MWHn_1sWQRje98ia4fuDt6PrJj-EQw9DzjxCfeT141_FH6lwM78TbsCdexkj9j3LBTneumyj545w93d601V1aPyyWVVmnQUIeU9wq8KjURgMIzPVOOzLSe2WcNFZrJG9IkABPzmxIGOWs23rMDRXkyeKcXf52AxGtD2PYu_Fz_X-NX9QdTyk |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/APSIPAASC58517.2023.10317104 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEL IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Music |
| EISBN | 9798350300673 |
| EISSN | 2640-0103 |
| EndPage | 971 |
| ExternalDocumentID | 10317104 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i204t-3d50c355b6001346f6ae72cc57a279663ec7e1e10cea7be175a9adc347e8ece93 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001108741800150&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:23:29 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-3d50c355b6001346f6ae72cc57a279663ec7e1e10cea7be175a9adc347e8ece93 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_10317104 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Oct.-31 |
| PublicationDateYYYYMMDD | 2023-10-31 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-Oct.-31 day: 31 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online) |
| PublicationTitleAbbrev | APSIPA ASC |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003203490 |
| Score | 1.8566489 |
| Snippet | Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 966 |
| SubjectTerms | Aggregates Asia Estimation Information processing Measurement Music Symbols |
| Title | Multi-task Piano Transcription with Local Relative Time Attention |
| URI | https://ieeexplore.ieee.org/document/10317104 |
| WOSCitedRecordID | wos001108741800150&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB1sFdGLWit-s4det02ym2xyDMWiUEqgKr2VzWQKRWilTfv73d2kVQ8evIUcQpgN-15m570H0MGZRBn4OUc_LLjh_4InOcY8krmH1tEcXUP_fahGo3gySbJarO60METkhs-oay_dWX6xxI1tlfVsJIFBRNmAhlJRJdbaN1REYK1WvGPo1D6avTQbv2RpOu7boy_VtUHh3d0jfoWpOCwZnP3zLc6h_a3KY9keby7ggBYtOP1hKNiCQxfbfAmp09XyUq8_WGY-gCVzmLTbIZjtvrKhhTFWTcNtiVkxCEvLspp_bMPb4Om1_8zrsAQ-DzxZclGEHhrykFsGI2Q0izSpADFUOlDmn0YQKvLJ95C0ysmwBp3oAoVUFBNSIq6guVgu6BpYjIb0hVhEXhLLPIy1WTCfKI91MYsSrW-gbasy_az8MKa7gtz-cf8OTmztqx3_HprlakMPcITbcr5ePbpV_AI9z5yN |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEJ0oGD8uKmL8dg9cF_ux7bbHhkggImkCGm5kOx0SYgIGCr_f3S2gHjx4a3pomp1m3-vsvPcAGjgRKDw34-gGOdf83-dxhhEPReagcTRH29B_78l-PxqN4nQjVrdaGCKyw2fUNJf2LD-f48q0yp5MJIFGRLEP1UAIzynlWruWiu8ZsxXnEBobJ82nJB100yQZtMzhl2yaqPDm9iG_4lQsmrRP__keZ1D_1uWxdIc457BHsxqc_LAUrEHVBjdfQGKVtbxQyw-W6k9gziwqbfcIZvqvrGeAjJXzcGtiRg7CkqIoJyDr8NZ-HrY6fBOXwKeeIwru54GDmj5khsP4IpyEiqSHGEjlSf1X4xNKcsl1kJTMSPMGFascfSEpIqTYv4TKbD6jK2ARatoXYB46cSSyIFK6ZC5RFql8EsZKXUPdrMr4s3TEGG8X5OaP-49w1Bm-9sa9bv_lFo5NHcr9_w4qxWJF93CA62K6XDzYin4Bk_Wf1A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+...+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+APSIPA+ASC+...+%28Online%29&rft.atitle=Multi-task+Piano+Transcription+with+Local+Relative+Time+Attention&rft.au=Wang%2C+Qi&rft.au=Liu%2C+Mingkuan&rft.au=Chen%2C+Xianhong&rft.au=Xiong%2C+Mengwen&rft.date=2023-10-31&rft.pub=IEEE&rft.eissn=2640-0103&rft.spage=966&rft.epage=971&rft_id=info:doi/10.1109%2FAPSIPAASC58517.2023.10317104&rft.externalDocID=10317104 |