Timbre Style Transfer for Musical Instruments Acoustic Guitar and Piano using the Generator-Discriminator Model

Music style transfer is a technique for creating new music by combining the input song's content and the target song's style to have a sound that humans can enjoy. This research is related to timbre style transfer, a branch of music style transfer that focuses on using the generator-discri...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Knowledge engineering and data science (Online) Ročník 7; číslo 1; s. 101
Hlavní autori: Nagari, Widean, Santoso, Joan, Setiawan, Esther Irawati
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: 05.09.2024
ISSN:2597-4602, 2597-4637
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Music style transfer is a technique for creating new music by combining the input song's content and the target song's style to have a sound that humans can enjoy. This research is related to timbre style transfer, a branch of music style transfer that focuses on using the generator-discriminator model. This exciting method has been used in various studies in the music style transfer domain to train a machine learning model to change the sound of instruments in a song with the sound of instruments from other songs. This work focuses on finding the best layer configuration in the generator-discriminator model for the timbre style transfer task. The dataset used for this research is the MAESTRO dataset. The metrics used in the testing phase are Contrastive Loss, Mean Squared Error, and Perceptual Evaluation of Speech Quality. Based on the results of the trials, it was concluded that the best model in this research was the model trained using column vectors from the mel-spectrogram. Some hyperparameters suitable in the training process are a learning rate 0.0005, batch size greater than or equal to 64, and dropout with a value of 0.1. The results of the ablation study show that the best layer configuration consists of 2 Bi-LSTM layers, 1 Attention layer, and 2 Dense layers.
ISSN:2597-4602
2597-4637
DOI:10.17977/um018v7i12024p101-116