Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network

Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 International Symposium on Image and Signal Processing and Analysis (ISPA) s. 1 - 6
Hlavní autoři: Rajabi, Amir, Krini, Mohammed
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 18.09.2023
Témata:
ISSN:1849-2266
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
AbstractList Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
Author Rajabi, Amir
Krini, Mohammed
Author_xml – sequence: 1
  givenname: Amir
  surname: Rajabi
  fullname: Rajabi, Amir
  email: amir.rajabi@th-ab.de
  organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
– sequence: 2
  givenname: Mohammed
  surname: Krini
  fullname: Krini, Mohammed
  email: mohammed.krini@th-ab.de
  organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
BookMark eNo1kM1Og0AUhUejiVr7BibyAtT5h3HXYNUmTWv6s26G4WInwkBgUOvON5dqXd2cL-d8i3uFzlzlAKFbgkeEYHU3Xb2MRcwEGVFM2YhgGsWKyBM0VJHqOWZEMMlO0SWJuQoplfICDdvWppjHAvO-dIm-F7W3pf2y7jVIqrLuvPa2crr4TQV8Wr-_D5agi3BtSwhWNYDZBRO3085ACc4Hm_Yw1i6Y5Lk19oCSyr1XRXc0LcF0TXPgD-BaCObQNT2eg_-omrdrdJ7rooXh8Q7Q5nGyTp7D2eJpmoxnoSVE-VBJAgCKA-FS0AikyCg2RGDBdZ5pzXQkhJEgsywl2HApcaryOFWgM8M5YwN08-e1vWdbN7bUzX77_zX2A_zlZn4
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPA58351.2023.10278916
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350315363
EISSN 1849-2266
EndPage 6
ExternalDocumentID 10278916
Genre orig-research
GroupedDBID 6IE
6IL
ABLEC
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IEGSK
RIE
RIL
ID FETCH-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
IEDL.DBID RIE
IngestDate Wed Jun 26 19:24:08 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
PageCount 6
ParticipantIDs ieee_primary_10278916
PublicationCentury 2000
PublicationDate 2023-Sept.-18
PublicationDateYYYYMMDD 2023-09-18
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-18
  day: 18
PublicationDecade 2020
PublicationTitle 2023 International Symposium on Image and Signal Processing and Analysis (ISPA)
PublicationTitleAbbrev ISPA
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib048504798
ssib042470063
Score 1.8523313
Snippet Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Artificial neural networks
Audio source separation
Computational cost
Computational modeling
Convolution
CRD model
Phase- aware Speech enhancement
Source separation
Speech enhancement
Speech quality and intelligibility
Telephone sets
Working environment noise
Title Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network
URI https://ieeexplore.ieee.org/document/10278916
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sCaEid-siFoBUupWpC6VXbiqB1Iq9JWgo1_zp2TFjEwsMU32InvnHv47jtCrhUYCT6VEnwTZiMuYxWZ3KdRoblhiZWMFUVoNqF6PT0amX5drB5qYbz3IfnMt_Ex3OXns2yFoTI44Vi3yWSDNJSSVbHWRnh4whXq2-1YC0RP13VOF4vNzdOwfyfA4kC3MEnbm9l-9VUJaqW7_88XOiCtnwI92t-qnkOy48sj8vUMx_9t-gkUWjVrqAN9YYTAl8uPWzoAyzDCwg86nHufTWinnCDncSUaEgioLWknIEsgCVZc1-IJMw0wPI-ATvQB_F9PEdsDyL0qmbxFXrudl_vHqO6wEE0ZM8vISAbfZLiH_6VIlJciT-IMbETBbZFbm1olRCa9zHPH4oxLGTtTaGdA_jLO0_SYNMtZ6U8IdamQzhXaMnDAkthplytltcgs464w9pS0cP_G8wpEY7zZurM_6OdkD7mEqRlMX5DmcrHyl2Q3Wy-n74urwPpvtV2wbA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cCaEie2Y7MhaEVFCVVbpG6VnThqB9KqpJVg45_jc9IiBga2-AY7sS_23fneO4SuI2skmJBz65sQ5VHuR55MTehlgkoSKE5IlrliE1Eci-FQdiuwusPCGGNc8plpwKO7y0-nyQJCZfYPB9wm4Ztoi1Ea-CVca6U-NKARnLjrtmDAny6qrC7iy5t2v3vHrM0BjmEQNlb9_aqs4g6W1t4_X2kf1X8geri7PnwO0IbJD9HXi90A3iafVoLLcg1VqM-1gPqy-LjFPWsbegD9wP2ZMckYN_MxrD2MhF0KAVY5bjpuCRDZEZeVgtqeehCgB0on_GA9YIOB3cOK4zKdvI5eW83B_aNX1VjwJoTIwpOc2G-S1NgdkwWR4SwN_MRaiYyqLFUqVBFjCTc8TTXxE8q5r2UmtLQamFAahkeolk9zc4ywDhnXOhOKWBcs8LXQaRQpwRJFqM6kOkF1mL_RrKTRGK2m7vQP-RXaeRw8d0addvx0hnZhxSBRg4hzVCvmC3OBtpNlMXmfXzo1-AYL2rOz
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+International+Symposium+on+Image+and+Signal+Processing+and+Analysis+%28ISPA%29&rft.atitle=Optimizing+Computational+Complexity%3A+Real-Time+Speech+Enhancement+Using+an+Efficient+Convolutional+Recurrent+Dense+Neural+Network&rft.au=Rajabi%2C+Amir&rft.au=Krini%2C+Mohammed&rft.date=2023-09-18&rft.pub=IEEE&rft.eissn=1849-2266&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISPA58351.2023.10278916&rft.externalDocID=10278916