Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network

Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2023 International Symposium on Image and Signal Processing and Analysis (ISPA) S. 1 - 6
Hauptverfasser: Rajabi, Amir, Krini, Mohammed
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 18.09.2023
Schlagworte:
ISSN:1849-2266
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
AbstractList Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
Author Rajabi, Amir
Krini, Mohammed
Author_xml – sequence: 1
  givenname: Amir
  surname: Rajabi
  fullname: Rajabi, Amir
  email: amir.rajabi@th-ab.de
  organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
– sequence: 2
  givenname: Mohammed
  surname: Krini
  fullname: Krini, Mohammed
  email: mohammed.krini@th-ab.de
  organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
BookMark eNo1kM1Og0AUhUejiVr7BibyAtT5h3HXYNUmTWv6s26G4WInwkBgUOvON5dqXd2cL-d8i3uFzlzlAKFbgkeEYHU3Xb2MRcwEGVFM2YhgGsWKyBM0VJHqOWZEMMlO0SWJuQoplfICDdvWppjHAvO-dIm-F7W3pf2y7jVIqrLuvPa2crr4TQV8Wr-_D5agi3BtSwhWNYDZBRO3085ACc4Hm_Yw1i6Y5Lk19oCSyr1XRXc0LcF0TXPgD-BaCObQNT2eg_-omrdrdJ7rooXh8Q7Q5nGyTp7D2eJpmoxnoSVE-VBJAgCKA-FS0AikyCg2RGDBdZ5pzXQkhJEgsywl2HApcaryOFWgM8M5YwN08-e1vWdbN7bUzX77_zX2A_zlZn4
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPA58351.2023.10278916
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350315363
EISSN 1849-2266
EndPage 6
ExternalDocumentID 10278916
Genre orig-research
GroupedDBID 6IE
6IL
ABLEC
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IEGSK
RIE
RIL
ID FETCH-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
IEDL.DBID RIE
IngestDate Wed Jun 26 19:24:08 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
PageCount 6
ParticipantIDs ieee_primary_10278916
PublicationCentury 2000
PublicationDate 2023-Sept.-18
PublicationDateYYYYMMDD 2023-09-18
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-18
  day: 18
PublicationDecade 2020
PublicationTitle 2023 International Symposium on Image and Signal Processing and Analysis (ISPA)
PublicationTitleAbbrev ISPA
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib048504798
ssib042470063
Score 1.8523313
Snippet Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Artificial neural networks
Audio source separation
Computational cost
Computational modeling
Convolution
CRD model
Phase- aware Speech enhancement
Source separation
Speech enhancement
Speech quality and intelligibility
Telephone sets
Working environment noise
Title Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network
URI https://ieeexplore.ieee.org/document/10278916
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sDqEifxiw1BK1hK1YLUrUpsR-1AWpW2Emz8c-6ctIiBgSnxDXbis3Pny33fEXItlfNWFo6pgmuWmsQxI1zBIp0lhQOb6HQeik2oXk-PRqZfg9UDFsZ7H5LPfBtvw798N7MrDJXBDkfcJpcN0lBKVmCtzeJJ41Shvd22tUD2dF3ndPHI3DwN-3cCPA48FsZJe9Pbr7oqwax09__5QAek9QPQo_2t6TkkO748Il_PsP3fpp8goVWxhjrQF1pIfLn8uKUD8AwZAj_ocO69ndBOOUHN40g0JBDQrKSdwCyBIhhxXS9P6GmA4XkkdKIPcP71FLk9QNyrkslb5LXbebl_ZHWFBTbl3CyZkRzeyaQevpciVl4KF0cWfESRZoXLsiRTQljppXM5j2wqZZSbQucG1p9N0yQ5Js1yVvoTQk0MDrCLPBfgUOZZbmIpCq0TCRfpkvyUtHD-xvOKRGO8mbqzP-TnZA-1hKkZXF-Q5nKx8pdk166X0_fFVVD9Nz-SsPM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cCaEie2Y7MhaEVFKVVbpG5VEjtqB9KqpJVg459z56ZFDAxMiW-wE_viO1_uvSPkWkbGpjIzXpQx5XEdGk8Lk3m-isPMgE00KnHFJqJ2Ww0GulOC1R0Wxlrrks9sDW_dv3wzSecYKoMvHHGbTG6SLcF54C_hWiv14QGP0OKu20ogf7oqs7qYr2-avc6dAJ8DD4ZBWFv196uyijMsjb1_PtI-qf5A9GhnbXwOyIbND8nXC2wAb-NPkNBluYYy1OdaSH1ZfNzSLviGHkI_aG9qbTqi9XyEa48jUZdCQOOc1h23BIpgxEWpoNBTFwP0SOlEH-AEbCmye4C4vUwnr5LXRr1__-iVNRa8MWO68LRk8E6aW9gxRRBZKUzgp-AlCh5nJo7DOBIilVYakzA_5VL6ic5UokEDU87D8IhU8klujwnVAbjAxrdMgEuZxIkOpMiUCiVcpAmTE1LF-RtOlzQaw9XUnf4hvyI7j_3n1rDVbD-dkV1cMUzUYOqcVIrZ3F6Q7XRRjN9nl04NvgGesLQ6
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+International+Symposium+on+Image+and+Signal+Processing+and+Analysis+%28ISPA%29&rft.atitle=Optimizing+Computational+Complexity%3A+Real-Time+Speech+Enhancement+Using+an+Efficient+Convolutional+Recurrent+Dense+Neural+Network&rft.au=Rajabi%2C+Amir&rft.au=Krini%2C+Mohammed&rft.date=2023-09-18&rft.pub=IEEE&rft.eissn=1849-2266&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISPA58351.2023.10278916&rft.externalDocID=10278916