Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network

Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a...

Full description

Saved in:
Bibliographic Details
Published in:2023 International Symposium on Image and Signal Processing and Analysis (ISPA) pp. 1 - 6
Main Authors: Rajabi, Amir, Krini, Mohammed
Format: Conference Proceeding
Language:English
Published: IEEE 18.09.2023
Subjects:
ISSN:1849-2266
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
AbstractList Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
Author Rajabi, Amir
Krini, Mohammed
Author_xml – sequence: 1
  givenname: Amir
  surname: Rajabi
  fullname: Rajabi, Amir
  email: amir.rajabi@th-ab.de
  organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
– sequence: 2
  givenname: Mohammed
  surname: Krini
  fullname: Krini, Mohammed
  email: mohammed.krini@th-ab.de
  organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
BookMark eNo1kM1Og0AUhUejiVr7BibyAtT5h3HXYNUmTWv6s26G4WInwkBgUOvON5dqXd2cL-d8i3uFzlzlAKFbgkeEYHU3Xb2MRcwEGVFM2YhgGsWKyBM0VJHqOWZEMMlO0SWJuQoplfICDdvWppjHAvO-dIm-F7W3pf2y7jVIqrLuvPa2crr4TQV8Wr-_D5agi3BtSwhWNYDZBRO3085ACc4Hm_Yw1i6Y5Lk19oCSyr1XRXc0LcF0TXPgD-BaCObQNT2eg_-omrdrdJ7rooXh8Q7Q5nGyTp7D2eJpmoxnoSVE-VBJAgCKA-FS0AikyCg2RGDBdZ5pzXQkhJEgsywl2HApcaryOFWgM8M5YwN08-e1vWdbN7bUzX77_zX2A_zlZn4
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPA58351.2023.10278916
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350315363
EISSN 1849-2266
EndPage 6
ExternalDocumentID 10278916
Genre orig-research
GroupedDBID 6IE
6IL
ABLEC
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IEGSK
RIE
RIL
ID FETCH-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
IEDL.DBID RIE
IngestDate Wed Jun 26 19:24:08 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
PageCount 6
ParticipantIDs ieee_primary_10278916
PublicationCentury 2000
PublicationDate 2023-Sept.-18
PublicationDateYYYYMMDD 2023-09-18
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-18
  day: 18
PublicationDecade 2020
PublicationTitle 2023 International Symposium on Image and Signal Processing and Analysis (ISPA)
PublicationTitleAbbrev ISPA
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib048504798
ssib042470063
Score 1.8524357
Snippet Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Artificial neural networks
Audio source separation
Computational cost
Computational modeling
Convolution
CRD model
Phase- aware Speech enhancement
Source separation
Speech enhancement
Speech quality and intelligibility
Telephone sets
Working environment noise
Title Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network
URI https://ieeexplore.ieee.org/document/10278916
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sCaksSPxGwIWoGEStWC1K1y7LPagVCVtBJs_HN8blrEwMAW3-BLfHfx-Xz3HSGXRiupLJhIOW9NPOU2UkJBxITTykrLwYWuJY9Zr5ePRqpfF6uHWhgACMln0MbHcJdv38wCQ2XewrFuM5EN0sgyuSrWWiuPZ5PhfrsZ5wLR0_M6pyuJ1dXDsH8jvMeBx8KUtdez_eqrEraV7u4_X2iPtH4K9Gh_s_Xsky0oD8jXkzf_1-mnp9BVs4Y60BdGCHxZfVzTgfcMIyz8oMMZgJnQTjlBySMnGhIIqC5pJyBLIMlzXNbq6WcaYHgeAZ3onT__AkVsD0_urZLJW-Sl23m-vY_qDgvRNElUFSmZ-G9SHPz_UqQZSGHT2HgfUXDtrNZMZ0IYCdLaIokNlzIulMsL5fXPcM7YIWmWbyUcEVqkjgGLQTmVc6dSnbNCGu69w6JAUMBj0sL1G89WIBrj9dKd_EE_JTsoJUzNSPIz0qzmCzgn22ZZTd_nF0H033R1sSU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMEEiCLeeGBNycN2YjYEragooWqL1K1K7IvagbQqaSXY-Of4nLSIgYEtvsGX2Gf7fLnvO0KuVSKF1KAcmZnVxHymHcklOAHPEqmFZpDZqiWdMI6j4VB2K7C6xcIAgE0-gwY-2n_5eqoWGCozKxxxm57YJFucMd8t4Vor8zGKQjxx1-2II396VGV1ea68afe7d9z4HHgx9IPGqr9flVXswdLa--cr7ZP6D0SPdteHzwHZgPyQfL2YDeBt8mkktCzXUIX6bAupL4uPW9ozvqGD0A_anwGoMW3mY5x71ERtCgFNctq03BIoMhqXlYGannoYoEdKJ_pgbsBAkd3DiOMynbxOXlvNwf2jU9VYcCaeJwtHCs98k2RgdkzuhyC49l1lvETOkkwnSZCEnCsBQuvUcxUTwk1lFqXSWKBiLAiOSC2f5nBMaOpnAQQuyExGLJN-EgWpUMz4h2mKtIAnpI7jN5qVNBqj1dCd_iG_IjuPg-fOqNOOn87ILs4YJmp40TmpFfMFXJBttSwm7_NLawbf1ge0bA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+International+Symposium+on+Image+and+Signal+Processing+and+Analysis+%28ISPA%29&rft.atitle=Optimizing+Computational+Complexity%3A+Real-Time+Speech+Enhancement+Using+an+Efficient+Convolutional+Recurrent+Dense+Neural+Network&rft.au=Rajabi%2C+Amir&rft.au=Krini%2C+Mohammed&rft.date=2023-09-18&rft.pub=IEEE&rft.eissn=1849-2266&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISPA58351.2023.10278916&rft.externalDocID=10278916