Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network

Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2023 International Symposium on Image and Signal Processing and Analysis (ISPA) s. 1 - 6
Hlavní autoři:	Rajabi, Amir, Krini, Mohammed
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 18.09.2023
Témata:	Artificial neural networks Audio source separation Computational cost Computational modeling Convolution CRD model Phase- aware Speech enhancement Source separation Speech enhancement Speech quality and intelligibility Telephone sets Working environment noise
ISSN:	1849-2266
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
AbstractList	Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
Author	Rajabi, Amir Krini, Mohammed
Author_xml	– sequence: 1 givenname: Amir surname: Rajabi fullname: Rajabi, Amir email: amir.rajabi@th-ab.de organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany – sequence: 2 givenname: Mohammed surname: Krini fullname: Krini, Mohammed email: mohammed.krini@th-ab.de organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany
BookMark	eNo1kM1Og0AUhUejiVr7BibyAtT5h3HXYNUmTWv6s26G4WInwkBgUOvON5dqXd2cL-d8i3uFzlzlAKFbgkeEYHU3Xb2MRcwEGVFM2YhgGsWKyBM0VJHqOWZEMMlO0SWJuQoplfICDdvWppjHAvO-dIm-F7W3pf2y7jVIqrLuvPa2crr4TQV8Wr-_D5agi3BtSwhWNYDZBRO3085ACc4Hm_Yw1i6Y5Lk19oCSyr1XRXc0LcF0TXPgD-BaCObQNT2eg_-omrdrdJ7rooXh8Q7Q5nGyTp7D2eJpmoxnoSVE-VBJAgCKA-FS0AikyCg2RGDBdZ5pzXQkhJEgsywl2HApcaryOFWgM8M5YwN08-e1vWdbN7bUzX77_zX2A_zlZn4
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ISPA58351.2023.10278916
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350315363
EISSN	1849-2266
EndPage	6
ExternalDocumentID	10278916
Genre	orig-research
GroupedDBID	6IE 6IL ABLEC ALMA_UNASSIGNED_HOLDINGS CBEJK IEGSK RIE RIL
ID	FETCH-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
IEDL.DBID	RIE
IngestDate	Wed Jun 26 19:24:08 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433
PageCount	6
ParticipantIDs	ieee_primary_10278916
PublicationCentury	2000
PublicationDate	2023-Sept.-18
PublicationDateYYYYMMDD	2023-09-18
PublicationDate_xml	– month: 09 year: 2023 text: 2023-Sept.-18 day: 18
PublicationDecade	2020
PublicationTitle	2023 International Symposium on Image and Signal Processing and Analysis (ISPA)
PublicationTitleAbbrev	ISPA
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib048504798 ssib042470063
Score	1.8523313
Snippet	Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Artificial neural networks Audio source separation Computational cost Computational modeling Convolution CRD model Phase- aware Speech enhancement Source separation Speech enhancement Speech quality and intelligibility Telephone sets Working environment noise
Title	Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network
URI	https://ieeexplore.ieee.org/document/10278916
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sCaEid-siFoBUupWpC6VXbiqB1Iq9JWgo1_zp2TFjEwsMU32InvnHv47jtCrhUYCT6VEnwTZiMuYxWZ3KdRoblhiZWMFUVoNqF6PT0amX5drB5qYbz3IfnMt_Ex3OXns2yFoTI44Vi3yWSDNJSSVbHWRnh4whXq2-1YC0RP13VOF4vNzdOwfyfA4kC3MEnbm9l-9VUJaqW7_88XOiCtnwI92t-qnkOy48sj8vUMx_9t-gkUWjVrqAN9YYTAl8uPWzoAyzDCwg86nHufTWinnCDncSUaEgioLWknIEsgCVZc1-IJMw0wPI-ATvQB_F9PEdsDyL0qmbxFXrudl_vHqO6wEE0ZM8vISAbfZLiH_6VIlJciT-IMbETBbZFbm1olRCa9zHPH4oxLGTtTaGdA_jLO0_SYNMtZ6U8IdamQzhXaMnDAkthplytltcgs464w9pS0cP_G8wpEY7zZurM_6OdkD7mEqRlMX5DmcrHyl2Q3Wy-n74urwPpvtV2wbA
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cCaEie2Y7MhaEVFCVVbpG6VnThqB9KqpJVg45_jc9IiBga2-AY7sS_23fneO4SuI2skmJBz65sQ5VHuR55MTehlgkoSKE5IlrliE1Eci-FQdiuwusPCGGNc8plpwKO7y0-nyQJCZfYPB9wm4Ztoi1Ea-CVca6U-NKARnLjrtmDAny6qrC7iy5t2v3vHrM0BjmEQNlb9_aqs4g6W1t4_X2kf1X8geri7PnwO0IbJD9HXi90A3iafVoLLcg1VqM-1gPqy-LjFPWsbegD9wP2ZMckYN_MxrD2MhF0KAVY5bjpuCRDZEZeVgtqeehCgB0on_GA9YIOB3cOK4zKdvI5eW83B_aNX1VjwJoTIwpOc2G-S1NgdkwWR4SwN_MRaiYyqLFUqVBFjCTc8TTXxE8q5r2UmtLQamFAahkeolk9zc4ywDhnXOhOKWBcs8LXQaRQpwRJFqM6kOkF1mL_RrKTRGK2m7vQP-RXaeRw8d0addvx0hnZhxSBRg4hzVCvmC3OBtpNlMXmfXzo1-AYL2rOz
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+International+Symposium+on+Image+and+Signal+Processing+and+Analysis+%28ISPA%29&rft.atitle=Optimizing+Computational+Complexity%3A+Real-Time+Speech+Enhancement+Using+an+Efficient+Convolutional+Recurrent+Dense+Neural+Network&rft.au=Rajabi%2C+Amir&rft.au=Krini%2C+Mohammed&rft.date=2023-09-18&rft.pub=IEEE&rft.eissn=1849-2266&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISPA58351.2023.10278916&rft.externalDocID=10278916