Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion.

Uloženo v:
Podrobná bibliografie
Název: Multiple Predominant Instruments Recognition in Polyphonic Music Using Spectro/Modgd-gram Fusion.
Autoři: Lekshmi, C. R., Rajeev, Rajan
Zdroj: Circuits, Systems & Signal Processing; Jun2023, Vol. 42 Issue 6, p3464-3484, 21p
Témata: CONVOLUTIONAL neural networks, DATA augmentation
Abstrakt: Identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN) through Mel-spectrogram, modgd-gram, and its fusion. Modgd-gram, a visual representation, is obtained by stacking modified group delay functions of consecutive frames successively. CNN learns the distinctive local characteristics from the visual representation and classifies the instrument to the group to which it belongs. The proposed system is systematically evaluated using the IRMAS dataset. We trained our networks using fixed-length audio excerpts to recognize multiple predominant instruments from the variable-length testing files. A wave-generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. We experimented with different fusion techniques, early fusion, mid-level fusion, and late or score-level fusion. The late fusion experiment reports a micro and macro F1 score of 0.69 and 0.62, respectively. These metrics are 7.81% and 12.73% higher than those obtained by the state-of-the-art Han's model. The architectural choice of CNN with score-level fusion on Mel-spectro/modgd-gram has merit in recognizing the predominant instruments in polyphonic music. [ABSTRACT FROM AUTHOR]
Copyright of Circuits, Systems & Signal Processing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Popis
Abstrakt:Identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN) through Mel-spectrogram, modgd-gram, and its fusion. Modgd-gram, a visual representation, is obtained by stacking modified group delay functions of consecutive frames successively. CNN learns the distinctive local characteristics from the visual representation and classifies the instrument to the group to which it belongs. The proposed system is systematically evaluated using the IRMAS dataset. We trained our networks using fixed-length audio excerpts to recognize multiple predominant instruments from the variable-length testing files. A wave-generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. We experimented with different fusion techniques, early fusion, mid-level fusion, and late or score-level fusion. The late fusion experiment reports a micro and macro F1 score of 0.69 and 0.62, respectively. These metrics are 7.81% and 12.73% higher than those obtained by the state-of-the-art Han's model. The architectural choice of CNN with score-level fusion on Mel-spectro/modgd-gram has merit in recognizing the predominant instruments in polyphonic music. [ABSTRACT FROM AUTHOR]
ISSN:0278081X
DOI:10.1007/s00034-022-02278-y