Emotion recognition for human–computer interaction using high-level descriptors

Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientific reports Jg. 14; H. 1; S. 12122 - 12
Hauptverfasser:	Singla, Chaitanya, Singh, Sukhdev, Sharma, Preeti, Mittal, Nitin, Gared, Fikreselam
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London Nature Publishing Group UK 27.05.2024 Nature Publishing Group Nature Portfolio
Schlagworte:	639/166 692/700 Algorithms Bayes Theorem Computer engineering Datasets Deep Learning Emotion recognition Emotions Emotions - physiology Females Happiness High-level features Humanities and Social Sciences Humans Language multidisciplinary Neural networks Neural Networks, Computer Punjabi database Punjabi speech emotion recognition Science Science (multidisciplinary) Social Media Social networks Speech Speech emotion recognition (SER) Deep learning Punjabi database Emotion recognition Punjabi speech emotion recognition Speech emotion recognition (SER) High-level features
ISSN:	2045-2322, 2045-2322
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The paper presents a novel approach to constructing and preprocessing a labeled speech corpus using diverse social media sources. By utilizing spectrograms as the primary feature representation, the proposed algorithm effectively learns discriminative patterns for emotion recognition. The method is evaluated on a custom dataset derived from various Punjabi media sources, including films and web series. Results demonstrate that the proposed approach achieves an accuracy of 69%, surpassing traditional methods like decision trees, Naïve Bayes, and random forests, which achieved accuracies of 49%, 52%, and 61% respectively. Thus, the proposed method improves accuracy in recognizing emotions from Punjabi speech signals.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-024-59294-y