Multi-task Piano Transcription with Local Relative Time Attention

Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings ... Asia-Pacific Signal and Information Processing Association Annual Summit and Conference APSIPA ASC ... (Online) s. 966 - 971
Hlavní autoři:	Wang, Qi, Liu, Mingkuan, Chen, Xianhong, Xiong, Mengwen
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 31.10.2023
Témata:	Aggregates Asia Estimation Information processing Measurement Music Symbols
ISSN:	2640-0103
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Automatic music transcription (AMT) is to transcribe music audios into musical symbol representations. Recently, the Transformer-based transcription systems have shown superiority on modeling note-wise sequences. For the frame-wise transcription targets in the AMT, the attention needs to focus more on the neighboring frames instead of notes in context. In this work, we propose a multi-task transcription system with a self-attention mechanism. The designed relative positional self-attention aims to model frame-wise short-term dependencies in audio and transcribe music of variable length. Adding the learnable attention mask on multiple attention head, the network can obtain different multi-scale attention distances for each subtask. Experiments on the MAESTRO dataset show the proposed system with the local relative time attention mechanism achieves state-of-the-art transcription performance on both frame and note metrics (frame F1 93.40%, note with offset F1 88.50%).
ISSN:	2640-0103
DOI:	10.1109/APSIPAASC58517.2023.10317104