Audio Classification for Feature-Based Majority Voting Optimization and Hyperparametric Tuning
This paper presents an optimized audio recognition system that integrates feature-based and deep learning approaches, fine-tuned for high-accuracy classification. The study builds upon previous research, where all possible feature-classifier combinations were analyzed to determine the most effective...
Uloženo v:
| Vydáno v: | 2025 33rd European Signal Processing Conference (EUSIPCO) s. 1412 - 1416 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
European Association for Signal Processing - EURASIP
08.09.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This paper presents an optimized audio recognition system that integrates feature-based and deep learning approaches, fine-tuned for high-accuracy classification. The study builds upon previous research, where all possible feature-classifier combinations were analyzed to determine the most effective configurations. Based on these findings, we focus on MFCC-34 and MFCC-38 for SVM and kNN, as well as spectrograms and Mel-spectrograms for CNN, forming six possible model combinations. A majority voting mechanism is implemented to enhance classification robustness. While grid search was previously applied to SVM and kNN, this work further refines the system by performing hyperparameter tuning for CNN, optimizing Conv2D filters, layer units, dense layer size, learning rate, dropout, and optimizer type. Additionally, the number of epochs is systematically tested from 10 to 30 in steps of 5 to determine the optimal training duration. The final implementation follows a structured pipeline, including data preparation, feature extraction, model training, evaluation and deployment preparation. The system is validated using multiple performance metrics, tuning process visualizations, and a 5-fold cross-validation repeated 20 times. Results demonstrate the effectiveness of our approach, achieving 97.65 % accuracy for CNN, 97.42 % for SVM, and 95.53 % for kNN, confirming the reliability of the proposed majority voting-based classification system. |
|---|---|
| DOI: | 10.23919/EUSIPCO63237.2025.11226494 |