Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network
Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a...
Uloženo v:
| Vydáno v: | 2023 International Symposium on Image and Signal Processing and Analysis (ISPA) s. 1 - 6 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
18.09.2023
|
| Témata: | |
| ISSN: | 1849-2266 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches. |
|---|---|
| AbstractList | Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches. |
| Author | Rajabi, Amir Krini, Mohammed |
| Author_xml | – sequence: 1 givenname: Amir surname: Rajabi fullname: Rajabi, Amir email: amir.rajabi@th-ab.de organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany – sequence: 2 givenname: Mohammed surname: Krini fullname: Krini, Mohammed email: mohammed.krini@th-ab.de organization: Aschaffenburg University of Applied Sciences,Signal Processing Laboratory,Aschaffenburg,Germany |
| BookMark | eNo1kM1Og0AUhUejiVr7BibyAtT5h3HXYNUmTWv6s26G4WInwkBgUOvON5dqXd2cL-d8i3uFzlzlAKFbgkeEYHU3Xb2MRcwEGVFM2YhgGsWKyBM0VJHqOWZEMMlO0SWJuQoplfICDdvWppjHAvO-dIm-F7W3pf2y7jVIqrLuvPa2crr4TQV8Wr-_D5agi3BtSwhWNYDZBRO3085ACc4Hm_Yw1i6Y5Lk19oCSyr1XRXc0LcF0TXPgD-BaCObQNT2eg_-omrdrdJ7rooXh8Q7Q5nGyTp7D2eJpmoxnoSVE-VBJAgCKA-FS0AikyCg2RGDBdZ5pzXQkhJEgsywl2HApcaryOFWgM8M5YwN08-e1vWdbN7bUzX77_zX2A_zlZn4 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISPA58351.2023.10278916 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350315363 |
| EISSN | 1849-2266 |
| EndPage | 6 |
| ExternalDocumentID | 10278916 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ABLEC ALMA_UNASSIGNED_HOLDINGS CBEJK IEGSK RIE RIL |
| ID | FETCH-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433 |
| IEDL.DBID | RIE |
| IngestDate | Wed Jun 26 19:24:08 EDT 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i119t-961eee94e146527e65d20c15054afdaa3a755c6e6ddb10c4660b9f8b9eadc4433 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_10278916 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Sept.-18 |
| PublicationDateYYYYMMDD | 2023-09-18 |
| PublicationDate_xml | – month: 09 year: 2023 text: 2023-Sept.-18 day: 18 |
| PublicationDecade | 2020 |
| PublicationTitle | 2023 International Symposium on Image and Signal Processing and Analysis (ISPA) |
| PublicationTitleAbbrev | ISPA |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib048504798 ssib042470063 |
| Score | 1.8523313 |
| Snippet | Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Artificial neural networks Audio source separation Computational cost Computational modeling Convolution CRD model Phase- aware Speech enhancement Source separation Speech enhancement Speech quality and intelligibility Telephone sets Working environment noise |
| Title | Optimizing Computational Complexity: Real-Time Speech Enhancement Using an Efficient Convolutional Recurrent Dense Neural Network |
| URI | https://ieeexplore.ieee.org/document/10278916 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sCaEid-siFoBUupWpC6VXbiqB1Iq9JWgo1_zp2TFjEwsMU32InvnHv47jtCrhUYCT6VEnwTZiMuYxWZ3KdRoblhiZWMFUVoNqF6PT0amX5drB5qYbz3IfnMt_Ex3OXns2yFoTI44Vi3yWSDNJSSVbHWRnh4whXq2-1YC0RP13VOF4vNzdOwfyfA4kC3MEnbm9l-9VUJaqW7_88XOiCtnwI92t-qnkOy48sj8vUMx_9t-gkUWjVrqAN9YYTAl8uPWzoAyzDCwg86nHufTWinnCDncSUaEgioLWknIEsgCVZc1-IJMw0wPI-ATvQB_F9PEdsDyL0qmbxFXrudl_vHqO6wEE0ZM8vISAbfZLiH_6VIlJciT-IMbETBbZFbm1olRCa9zHPH4oxLGTtTaGdA_jLO0_SYNMtZ6U8IdamQzhXaMnDAkthplytltcgs464w9pS0cP_G8wpEY7zZurM_6OdkD7mEqRlMX5DmcrHyl2Q3Wy-n74urwPpvtV2wbA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cCaEie2Y7MhaEVFCVVbpG6VnThqB9KqpJVg45_jc9IiBga2-AY7sS_23fneO4SuI2skmJBz65sQ5VHuR55MTehlgkoSKE5IlrliE1Eci-FQdiuwusPCGGNc8plpwKO7y0-nyQJCZfYPB9wm4Ztoi1Ea-CVca6U-NKARnLjrtmDAny6qrC7iy5t2v3vHrM0BjmEQNlb9_aqs4g6W1t4_X2kf1X8geri7PnwO0IbJD9HXi90A3iafVoLLcg1VqM-1gPqy-LjFPWsbegD9wP2ZMckYN_MxrD2MhF0KAVY5bjpuCRDZEZeVgtqeehCgB0on_GA9YIOB3cOK4zKdvI5eW83B_aNX1VjwJoTIwpOc2G-S1NgdkwWR4SwN_MRaiYyqLFUqVBFjCTc8TTXxE8q5r2UmtLQamFAahkeolk9zc4ywDhnXOhOKWBcs8LXQaRQpwRJFqM6kOkF1mL_RrKTRGK2m7vQP-RXaeRw8d0addvx0hnZhxSBRg4hzVCvmC3OBtpNlMXmfXzo1-AYL2rOz |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+International+Symposium+on+Image+and+Signal+Processing+and+Analysis+%28ISPA%29&rft.atitle=Optimizing+Computational+Complexity%3A+Real-Time+Speech+Enhancement+Using+an+Efficient+Convolutional+Recurrent+Dense+Neural+Network&rft.au=Rajabi%2C+Amir&rft.au=Krini%2C+Mohammed&rft.date=2023-09-18&rft.pub=IEEE&rft.eissn=1849-2266&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISPA58351.2023.10278916&rft.externalDocID=10278916 |