Audio Classification for Feature-Based Majority Voting Optimization and Hyperparametric Tuning

This paper presents an optimized audio recognition system that integrates feature-based and deep learning approaches, fine-tuned for high-accuracy classification. The study builds upon previous research, where all possible feature-classifier combinations were analyzed to determine the most effective...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 33rd European Signal Processing Conference (EUSIPCO) s. 1412 - 1416
Hlavní autoři:	Telembici, Lorena, Rusu, Corneliu
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	European Association for Signal Processing - EURASIP 08.09.2025
Témata:	Accuracy Audio recognition audio-based assistive systems Box and whisker plot Classification Performance CNN Hyperparameter tuning kNN Majority voting Melspectrograms MFCC Performance metrics Pipelines Robustness Spectrogram spectrograms Speech recognition Stability analysis SVM Thermal stability Training Tuning
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This paper presents an optimized audio recognition system that integrates feature-based and deep learning approaches, fine-tuned for high-accuracy classification. The study builds upon previous research, where all possible feature-classifier combinations were analyzed to determine the most effective configurations. Based on these findings, we focus on MFCC-34 and MFCC-38 for SVM and kNN, as well as spectrograms and Mel-spectrograms for CNN, forming six possible model combinations. A majority voting mechanism is implemented to enhance classification robustness. While grid search was previously applied to SVM and kNN, this work further refines the system by performing hyperparameter tuning for CNN, optimizing Conv2D filters, layer units, dense layer size, learning rate, dropout, and optimizer type. Additionally, the number of epochs is systematically tested from 10 to 30 in steps of 5 to determine the optimal training duration. The final implementation follows a structured pipeline, including data preparation, feature extraction, model training, evaluation and deployment preparation. The system is validated using multiple performance metrics, tuning process visualizations, and a 5-fold cross-validation repeated 20 times. Results demonstrate the effectiveness of our approach, achieving 97.65 % accuracy for CNN, 97.42 % for SVM, and 95.53 % for kNN, confirming the reliability of the proposed majority voting-based classification system.
DOI:	10.23919/EUSIPCO63237.2025.11226494