Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation.

Uložené v:
Podrobná bibliografia
Názov: Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation.
Autori: Roh, Kyung-Min, Lee, Seok-Pil
Zdroj: Applied Sciences (2076-3417); Nov2024, Vol. 14 Issue 21, p9890, 14p
Predmety: ARTIFICIAL intelligence, EMOTION recognition, DATA augmentation, DEEP learning, SELF-expression
Abstrakt: With the advancement of Artificial Intelligence (AI) and the Internet of Things (IoT), research in the field of emotion detection and recognition has been actively conducted worldwide in modern society. Among this research, speech emotion recognition has gained increasing importance in various areas of application such as personalized services, enhanced security, and the medical field. However, subjective emotional expressions in voice data can be perceived differently by individuals, and issues such as data imbalance and limited datasets fail to provide the diverse situations necessary for model training, thus limiting performance. To overcome these challenges, this paper proposes a novel data augmentation technique using Conditional-DCGAN, which combines CGAN and DCGAN. This study analyzes the temporal signal changes using Mel-spectrograms extracted from the Emo-DB dataset and applies a loss function calculation method borrowed from reinforcement learning to generate data that accurately reflects emotional characteristics. To validate the proposed method, experiments were conducted using a model combining CNN and Bi-LSTM. The results, including augmented data, achieved significant performance improvements, reaching WA 91.46% and UAR 91.61%, compared to using only the original data (WA 79.31%, UAR 78.16%). These results outperform similar previous studies, such as those reporting WA 84.49% and UAR 83.33%, demonstrating the positive effects of the proposed data augmentation technique. This study presents a new data augmentation method that enables effective learning even in situations with limited data, offering a progressive direction for research in speech emotion recognition. [ABSTRACT FROM AUTHOR]
Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáza: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=20763417&ISBN=&volume=14&issue=21&date=20241101&spage=9890&pages=9890-9903&title=Applied Sciences (2076-3417)&atitle=Enhanced%20Speech%20Emotion%20Recognition%20Using%20Conditional-DCGAN-Based%20Data%20Augmentation.&aulast=Roh%2C%20Kyung-Min&id=DOI:10.3390/app14219890
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Roh%20K
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 180782903
RelevancyScore: 1007
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1007.06042480469
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Roh%2C+Kyung-Min%22">Roh, Kyung-Min</searchLink><br /><searchLink fieldCode="AR" term="%22Lee%2C+Seok-Pil%22">Lee, Seok-Pil</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Applied Sciences (2076-3417); Nov2024, Vol. 14 Issue 21, p9890, 14p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22ARTIFICIAL+intelligence%22">ARTIFICIAL intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22EMOTION+recognition%22">EMOTION recognition</searchLink><br /><searchLink fieldCode="DE" term="%22DATA+augmentation%22">DATA augmentation</searchLink><br /><searchLink fieldCode="DE" term="%22DEEP+learning%22">DEEP learning</searchLink><br /><searchLink fieldCode="DE" term="%22SELF-expression%22">SELF-expression</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: With the advancement of Artificial Intelligence (AI) and the Internet of Things (IoT), research in the field of emotion detection and recognition has been actively conducted worldwide in modern society. Among this research, speech emotion recognition has gained increasing importance in various areas of application such as personalized services, enhanced security, and the medical field. However, subjective emotional expressions in voice data can be perceived differently by individuals, and issues such as data imbalance and limited datasets fail to provide the diverse situations necessary for model training, thus limiting performance. To overcome these challenges, this paper proposes a novel data augmentation technique using Conditional-DCGAN, which combines CGAN and DCGAN. This study analyzes the temporal signal changes using Mel-spectrograms extracted from the Emo-DB dataset and applies a loss function calculation method borrowed from reinforcement learning to generate data that accurately reflects emotional characteristics. To validate the proposed method, experiments were conducted using a model combining CNN and Bi-LSTM. The results, including augmented data, achieved significant performance improvements, reaching WA 91.46% and UAR 91.61%, compared to using only the original data (WA 79.31%, UAR 78.16%). These results outperform similar previous studies, such as those reporting WA 84.49% and UAR 83.33%, demonstrating the positive effects of the proposed data augmentation technique. This study presents a new data augmentation method that enables effective learning even in situations with limited data, offering a progressive direction for research in speech emotion recognition. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=180782903
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.3390/app14219890
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 14
        StartPage: 9890
    Subjects:
      – SubjectFull: ARTIFICIAL intelligence
        Type: general
      – SubjectFull: EMOTION recognition
        Type: general
      – SubjectFull: DATA augmentation
        Type: general
      – SubjectFull: DEEP learning
        Type: general
      – SubjectFull: SELF-expression
        Type: general
    Titles:
      – TitleFull: Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Roh, Kyung-Min
      – PersonEntity:
          Name:
            NameFull: Lee, Seok-Pil
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 11
              Text: Nov2024
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 20763417
          Numbering:
            – Type: volume
              Value: 14
            – Type: issue
              Value: 21
          Titles:
            – TitleFull: Applied Sciences (2076-3417)
              Type: main
ResultId 1