Speech Enhancement Algorithm Based on Microphone Array and Multi-Channel Parallel GRU-CNN Network
This paper presents an improved speech enhancement algorithm based on microphone arrays to improve speech enhancement performance in complex settings. The algorithm’s model consists of two key components: the feature extraction module and the speech enhancement module. The feature extraction module...
Gespeichert in:
| Veröffentlicht in: | Electronics (Basel) Jg. 14; H. 4; S. 681 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Basel
MDPI AG
01.02.2025
|
| Schlagworte: | |
| ISSN: | 2079-9292, 2079-9292 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | This paper presents an improved speech enhancement algorithm based on microphone arrays to improve speech enhancement performance in complex settings. The algorithm’s model consists of two key components: the feature extraction module and the speech enhancement module. The feature extraction module processes the speech amplitude spectral features derived from STFT (short-time Fourier transform). It employs parallel GRU-CNN (Gated Recurrent Units and CNN Convolutional Neural Network) structures to capture unique channel information, and skip connections are utilized to enhance the model’s convergence speed. The speech enhancement module focuses on obtaining cross-channel spatial information. By introducing an attention mechanism and applying a global hybrid pooling strategy, it reduces feature loss. This strategy dynamically assigns weights to each channel, emphasizing features that are most beneficial for speech signal restoration. Experimental results on the CHIME3 dataset show that the proposed model effectively suppresses diverse types of noise and outperforms other algorithms in improving speech quality and comprehension. |
|---|---|
| AbstractList | This paper presents an improved speech enhancement algorithm based on microphone arrays to improve speech enhancement performance in complex settings. The algorithm’s model consists of two key components: the feature extraction module and the speech enhancement module. The feature extraction module processes the speech amplitude spectral features derived from STFT (short-time Fourier transform). It employs parallel GRU-CNN (Gated Recurrent Units and CNN Convolutional Neural Network) structures to capture unique channel information, and skip connections are utilized to enhance the model’s convergence speed. The speech enhancement module focuses on obtaining cross-channel spatial information. By introducing an attention mechanism and applying a global hybrid pooling strategy, it reduces feature loss. This strategy dynamically assigns weights to each channel, emphasizing features that are most beneficial for speech signal restoration. Experimental results on the CHIME3 dataset show that the proposed model effectively suppresses diverse types of noise and outperforms other algorithms in improving speech quality and comprehension. |
| Author | Xu, Zhe Xi, Ji Xie, Yue Zhang, Weiqi Zhao, Li |
| Author_xml | – sequence: 1 givenname: Ji surname: Xi fullname: Xi, Ji – sequence: 2 givenname: Zhe surname: Xu fullname: Xu, Zhe – sequence: 3 givenname: Weiqi surname: Zhang fullname: Zhang, Weiqi – sequence: 4 givenname: Yue surname: Xie fullname: Xie, Yue – sequence: 5 givenname: Li surname: Zhao fullname: Zhao, Li |
| BookMark | eNptUE1PwkAU3BhMROQXeNnEc3U_yn4csUEwATQq52a7fdhi2a27JYZ_bw0ePPguM4eZN5m5RAPnHSB0Tckt55rcQQO2C97VNtKUpEQoeoaGjEidaKbZ4A-_QOMYd6Q_TbniZIjMawtgKzxzlXEW9uA6PG3efai7ao_vTYQSe4dXtQ2-rfpgPA3BHLFxJV4dmq5Ost7ooMHPJpim6cn8ZZNk6zVeQ_flw8cVOt-aJsL4F0do8zB7yxbJ8mn-mE2XiWWSdYmgTEyslIpTm9IJK4oSjFAm1UVBhGGKsZIbreRWUhBSslJpo6xNtaCTkgs-Qjenv23wnweIXb7zh-D6yJxTSQlRRKW9ip9UfZ8YA2zzNtR7E445JfnPnPk_c_JvRiJr5Q |
| Cites_doi | 10.1109/PROC.1979.11540 10.1109/APSIPAASC47483.2019.9023115 10.1109/ICASSP49357.2023.10095716 10.1109/ICASSP39728.2021.9413955 10.1109/ICASSP40776.2020.9053092 10.1109/TASLP.2024.3357036 10.20944/preprints202201.0399.v1 10.3115/1075527.1075614 10.1121/10.0011809 10.1109/ICASSP40776.2020.9054177 10.1109/LSP.2023.3244428 10.1109/ICASSP49357.2023.10095509 10.1109/ICASSP43922.2022.9746054 10.1007/978-3-642-23250-3 10.1109/TASLP.2022.3221046 10.1109/TASSP.1979.1163209 10.1016/j.inffus.2023.101869 10.1109/CVPR42600.2020.01155 10.1109/TASLP.2024.3352259 10.21437/Interspeech.2020-1101 10.1109/ICECAA58104.2023.10212180 10.21437/Interspeech.2018-1405 10.1109/ICASSP40776.2020.9053989 10.1109/ASRU46091.2019.9003849 10.1109/TASLP.2021.3083405 10.1109/ASRU.2015.7404837 10.1201/9781420015836 10.1109/TASLP.2020.2976193 10.1109/TASLP.2022.3145319 10.1109/ICASSP48485.2024.10445847 10.1109/ICASSP.2010.5495701 |
| ContentType | Journal Article |
| Copyright | 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION 7SP 8FD 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L7M P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
| DOI | 10.3390/electronics14040681 |
| DatabaseName | CrossRef Electronics & Communications Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One ProQuest Central SciTech Premium Collection Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China |
| DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection Technology Research Database ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition Electronics & Communications Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic Advanced Technologies Database with Aerospace ProQuest One Academic (New) |
| DatabaseTitleList | Publicly Available Content Database CrossRef |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2079-9292 |
| ExternalDocumentID | 10_3390_electronics14040681 |
| GroupedDBID | 5VS 8FE 8FG AAYXX ADMLS AFFHD AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS BENPR BGLVJ CCPQU CITATION HCIFZ IAO ITC KQ8 MODMG M~E OK1 P62 PHGZM PHGZT PIMPY PQGLB PROAC 7SP 8FD ABUWG AZQEC DWQXO L7M PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c272t-61265c77831c4152bbdea68a49bb06a2822d3a987f71e6772d89a8cc49615d363 |
| IEDL.DBID | PIMPY |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001431810000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2079-9292 |
| IngestDate | Fri Jul 25 21:38:15 EDT 2025 Sat Nov 29 07:17:46 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c272t-61265c77831c4152bbdea68a49bb06a2822d3a987f71e6772d89a8cc49615d363 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://www.proquest.com/publiccontent/docview/3171008084?pq-origsite=%requestingapplication% |
| PQID | 3171008084 |
| PQPubID | 2032404 |
| ParticipantIDs | proquest_journals_3171008084 crossref_primary_10_3390_electronics14040681 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-02-01 |
| PublicationDateYYYYMMDD | 2025-02-01 |
| PublicationDate_xml | – month: 02 year: 2025 text: 2025-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Basel |
| PublicationPlace_xml | – name: Basel |
| PublicationTitle | Electronics (Basel) |
| PublicationYear | 2025 |
| Publisher | MDPI AG |
| Publisher_xml | – name: MDPI AG |
| References | Lee (ref_37) 2023; 30 Mehrish (ref_8) 2023; 99 Lim (ref_3) 1979; 67 ref_14 ref_12 ref_34 ref_11 ref_33 ref_10 ref_32 Tesch (ref_35) 2022; 31 ref_31 ref_30 ref_19 ref_16 Grumiaux (ref_7) 2022; 152 Wang (ref_17) 2021; 29 Tan (ref_18) 2022; 30 ref_25 ref_24 ref_23 ref_22 ref_21 ref_20 Quan (ref_36) 2024; 32 ref_1 ref_2 ref_29 ref_28 ref_27 ref_26 ref_9 Chau (ref_13) 2024; 32 Liu (ref_15) 2020; 28 ref_5 Boll (ref_4) 1979; 27 ref_6 |
| References_xml | – volume: 67 start-page: 1586 year: 1979 ident: ref_3 article-title: Enhancement and bandwidth compression of noisy speech publication-title: Proc. IEEE doi: 10.1109/PROC.1979.11540 – ident: ref_30 – ident: ref_12 doi: 10.1109/APSIPAASC47483.2019.9023115 – ident: ref_5 – ident: ref_26 – ident: ref_10 doi: 10.1109/ICASSP49357.2023.10095716 – ident: ref_21 doi: 10.1109/ICASSP39728.2021.9413955 – ident: ref_16 doi: 10.1109/ICASSP40776.2020.9053092 – volume: 32 start-page: 1310 year: 2024 ident: ref_36 article-title: SpatialNet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2024.3357036 – ident: ref_6 doi: 10.20944/preprints202201.0399.v1 – ident: ref_28 doi: 10.3115/1075527.1075614 – volume: 152 start-page: 107 year: 2022 ident: ref_7 article-title: A survey of sound source localization with deep learning methods publication-title: J. Acoust. Soc. Am. doi: 10.1121/10.0011809 – ident: ref_23 – ident: ref_14 doi: 10.1109/ICASSP40776.2020.9054177 – volume: 30 start-page: 155 year: 2023 ident: ref_37 article-title: DeFT-AN: Dense frequency-time attentive network for multichannel speech enhancement publication-title: IEEE Signal Process. Lett. doi: 10.1109/LSP.2023.3244428 – ident: ref_20 doi: 10.1109/ICASSP49357.2023.10095509 – ident: ref_22 doi: 10.1109/ICASSP43922.2022.9746054 – ident: ref_24 doi: 10.1007/978-3-642-23250-3 – ident: ref_25 – ident: ref_29 – volume: 31 start-page: 563 year: 2022 ident: ref_35 article-title: Insights into deep non-linear filters for improved multi-channel speech enhancement publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2022.3221046 – volume: 27 start-page: 113 year: 1979 ident: ref_4 article-title: Suppression of acoustic noise in speech using spectral subtraction publication-title: IEEE Trans. Acoust. Speech Signal Process. doi: 10.1109/TASSP.1979.1163209 – volume: 99 start-page: 101869 year: 2023 ident: ref_8 article-title: A review of deep learning techniques for speech processing publication-title: Inf. Fusion doi: 10.1016/j.inffus.2023.101869 – ident: ref_27 doi: 10.1109/CVPR42600.2020.01155 – volume: 32 start-page: 1133 year: 2024 ident: ref_13 article-title: A novel approach to multi-channel speech enhancement based on graph neural networks publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2024.3352259 – ident: ref_11 doi: 10.21437/Interspeech.2020-1101 – ident: ref_2 doi: 10.1109/ICECAA58104.2023.10212180 – ident: ref_32 doi: 10.21437/Interspeech.2018-1405 – ident: ref_34 doi: 10.1109/ICASSP40776.2020.9053989 – ident: ref_9 doi: 10.1109/ASRU46091.2019.9003849 – volume: 29 start-page: 2001 year: 2021 ident: ref_17 article-title: Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2021.3083405 – ident: ref_33 doi: 10.1109/ASRU.2015.7404837 – ident: ref_1 doi: 10.1201/9781420015836 – volume: 28 start-page: 1888 year: 2020 ident: ref_15 article-title: Multichannel speech enhancement by raw waveform-mapping using fully convolutional networks publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2020.2976193 – volume: 30 start-page: 605 year: 2022 ident: ref_18 article-title: Neural spectrospatial filtering publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2022.3145319 – ident: ref_19 doi: 10.1109/ICASSP48485.2024.10445847 – ident: ref_31 doi: 10.1109/ICASSP.2010.5495701 |
| SSID | ssj0000913830 |
| Score | 2.3087907 |
| Snippet | This paper presents an improved speech enhancement algorithm based on microphone arrays to improve speech enhancement performance in complex settings. The... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Index Database |
| StartPage | 681 |
| SubjectTerms | Algorithms Arrays Artificial neural networks Deep learning Feature extraction Feature selection Fourier transforms Microphones Modules Neural networks Parameter estimation Signal processing Spatial data Speech Speech processing |
| Title | Speech Enhancement Algorithm Based on Microphone Array and Multi-Channel Parallel GRU-CNN Network |
| URI | https://www.proquest.com/docview/3171008084 |
| Volume | 14 |
| WOSCitedRecordID | wos001431810000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: M~E dateStart: 20120101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: ProQuest advanced technologies & aerospace journals customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: P5Z dateStart: 20120301 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central (subscription) customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: BENPR dateStart: 20120301 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: PIMPY dateStart: 20120301 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwED7RlgEG3ohn5YGRqHk1sSdEqxYYGkVApcISubZLkUpa2oLEwm_nLkl5SIiJJUsky_Lj7j778_cBnHjCaOVonIHQQYDiSNcSuo7B0BtQSqK7Mp2ZTYRRxHs9ERfPo2cFrXIRE7NAnas9E28bg3BNjxWdmNcw65Eqjc39s8mzRR5SdNdaGGqUoELCW3YZKvFVJ777PHMhDUzu2bn4kIdov_blNTMjnRk74M7PBPUzPmdJp73-v93dgLWi-GTn-WrZhCWTbsHqN0nCbZA3E2PUkLXSIa0HOjtk56MHbGw-fGINzHmajVPWIR4f0doNtjaVb0ymmmWPeS16r5CaEYvllHxaRuziums1o4hFOeN8B7rt1m3z0ipsGCzlhu4cwaUb1FUYcs9RlO77fW1kwKUv-n07kMRD1Z4UPByEjgmwWtdcSK6UL7Ba0l7g7UI5xf7sAUO8JBDhBcb2tS_DQNSpHBpILLI0Qje5D6eLsU8mudpGgiiFpir5Zar24Wgx-Emx9WbJ11gf_P37EFZcMvPNKNhHUJ5PX8wxLKvX-eNsWoVKoxXF11Uodd5b-I3r99ViPX0A-l3YmA |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LS8NAEB60CurBt1ituge9GUyyaZI9iNR30YbiA_QUt7tbK9RY26r4p_yNziSNDxBvHjwHliTzMd98u7PzAaxzYbRyNEYgcFCgONK1hC5jMuRNoiQ6K9Op2UQQReHVlagPwVt-F4baKvOcmCZq_aBoj3wLeY7m0Niht9N5tMg1ik5XcwuNDBYn5vUFJVtvu7qP8d1w3cODi71ja-AqYCk3cPuolVy_rIIg5I4i9mo0tJF-KD3RaNi-pLZKzSVK8WbgGB-LTx0KGSrlCSR_zX2O6w7DiIdgtwswUq_W6tcfuzo0ZTPkdjbeiHNhb3262fRoko3th853CvzOACmtHU79tx8yDZODAppVMsTPwJBJZmHiy1jFOZDnHWNUix0kLcI07X-ySvsWX77fume7yNuaPSSsRr2I1JpvcLWufGUy0Sy9kGzRnYvEtFlddslrps2Ozi6tvShiUdY1Pw-Xf_KNC1BI8H0WgaHmE6hSfWN72pOBL8pU0jUlFooa5acswmYe3biTTQyJUWkRGOIfwFCEUh7eeJA-evFnbJd-f7wGY8cXtdP4tBqdLMO4S-bEaUt5CQr97pNZgVH13L_rdVcHSGVw89dYeAf28SP3 |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3JTsMwEB1BQQgO7IgdH-BG1CROk_iAEEsLCIgqFolbcG2XIpW0tAXEr_F1zGRhkRA3DpwjWU7mZWaePTMPYJMLo5Wj0QKBgwTFka4ldAWdIW9SSKK7Mp2KTQRRFN7ciPoQvBW9MFRWWfjE1FHrjqIz8jLGOZpDY4deuZmXRdQPa7vdR4sUpOimtZDTyCByal5fkL71d04O0dZbrlurXh0cW7nCgKXcwB0gb3L9igqCkDuKIlmjoY30Q-mJRsP2JZVYai6RljcDx_iYiOpQyFApT2AioLnPcd1hGAk4kp4SjOxXo_rFxwkPTdwMuZ2NOuJc2OVPZZs-TbWx_dD5Hg6_R4M0xNWm_vPHmYbJPLFme9mfMANDJpmFiS_jFudAXnaNUS1WTVqEdToXZXvtO9z8oPXA9jGea9ZJ2DnVKFLJvsHVevKVyUSztFHZol6MxLRZXfZIg6bNji6urYMoYlFWTT8P13_yjgtQSnA_i8CQCwpkr76xPe3JwBcVSvWaEhNIjbRULsF2Yem4m00SiZGBETDiH4CxBKuFqePcrfTjTzsv__54A8YQAPHZSXS6AuMuaRanlearUBr0nswajKrnwX2_t56DlsHtX0PhHTtLLJE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Speech+Enhancement+Algorithm+Based+on+Microphone+Array+and+Multi-Channel+Parallel+GRU-CNN+Network&rft.jtitle=Electronics+%28Basel%29&rft.au=Ji+Xi&rft.au=Xu%2C+Zhe&rft.au=Zhang%2C+Weiqi&rft.au=Xie%2C+Yue&rft.date=2025-02-01&rft.pub=MDPI+AG&rft.eissn=2079-9292&rft.volume=14&rft.issue=4&rft.spage=681&rft_id=info:doi/10.3390%2Felectronics14040681&rft.externalDBID=HAS_PDF_LINK |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2079-9292&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2079-9292&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2079-9292&client=summon |