A DenseU-Net framework for Music Source Separation using Spectrogram Domain Approach

Uloženo v:
Podrobná bibliografie
Název: A DenseU-Net framework for Music Source Separation using Spectrogram Domain Approach
Autoři: Vinitha George E
Zdroj: International Journal of Intelligent Systems and Applications in Engineering; Vol. 12 No. 4 (2024); 77-85
Informace o vydavateli: International Journal of Intelligent Systems and Applications in Engineering, 2024.
Rok vydání: 2024
Témata: Autoencoder, Convolutional Neural Network, Deep learning, DenseNet, Music source separation, ResNet, U-Net architecture
Popis: Audio source separation has been intensively explored by the research community. Deep learning algorithms aid in creating a neural network model to isolate the different sources present in a music mixture. In this paper, we propose an algorithm to separate the constituent sources present in a music signal mixture using a DenseUNet framework. The conversion of an audio signal into a spectrogram, akin to an image, accentuates the valuable attributes concealed in the time domain signal. Hence, a spectrogram-based model is chosen for the extraction of the target signal. The model incorporates a dense block into the layers of the U-Net structure. The proposed system is trained to extract individual source spectrograms from the mixture spectrogram. An ablation study was performed by replacing the dense block with convolution filters to study the effectiveness of the dense block. The proposed method proves to be more efficient in comparison with other state-of-the-art methods. The experiment results to separate vocals, bass, drums and others show an average SDR of 6.59 dB on the MUSDB database.
Druh dokumentu: Article
Popis souboru: application/pdf
Jazyk: English
ISSN: 2147-6799
Přístupová URL adresa: https://www.ijisae.org/index.php/IJISAE/article/view/6175
Rights: CC BY SA
Přístupové číslo: edsair.issn21476799..2777c9b7a9e731cc4820acbb1d0afcaa
Databáze: OpenAIRE
Popis
Abstrakt:Audio source separation has been intensively explored by the research community. Deep learning algorithms aid in creating a neural network model to isolate the different sources present in a music mixture. In this paper, we propose an algorithm to separate the constituent sources present in a music signal mixture using a DenseUNet framework. The conversion of an audio signal into a spectrogram, akin to an image, accentuates the valuable attributes concealed in the time domain signal. Hence, a spectrogram-based model is chosen for the extraction of the target signal. The model incorporates a dense block into the layers of the U-Net structure. The proposed system is trained to extract individual source spectrograms from the mixture spectrogram. An ablation study was performed by replacing the dense block with convolution filters to study the effectiveness of the dense block. The proposed method proves to be more efficient in comparison with other state-of-the-art methods. The experiment results to separate vocals, bass, drums and others show an average SDR of 6.59 dB on the MUSDB database.
ISSN:21476799