Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution
In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accura...
Uloženo v:
| Vydáno v: | IEEE/ACM transactions on audio, speech, and language processing Ročník 28; s. 1948 - 1963 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2329-9290, 2329-9304 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accurate BSE than ILRMA, which assumes each source to be a point source (rank-1 spatial model), the proposed method restores the lost spatial basis for the full-rank SCM of diffuse noise. We adopt the multivariate complex generalized Gaussian distribution (GGD) as the statistical generative model to express various types of observed signal. To estimate the model parameters for an arbitrary shape parameter of the multivariate GGD, we derive a new inequality for rank-constrained SCMs. Also, we propose new acceleration methods to accomplish much faster extraction than conventional blind source separation methods. In BSE experiments using simulated and real recorded data, we confirm that the proposed method achieves more accurate and faster speech extraction than conventional methods. |
|---|---|
| AbstractList | In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accurate BSE than ILRMA, which assumes each source to be a point source (rank-1 spatial model), the proposed method restores the lost spatial basis for the full-rank SCM of diffuse noise. We adopt the multivariate complex generalized Gaussian distribution (GGD) as the statistical generative model to express various types of observed signal. To estimate the model parameters for an arbitrary shape parameter of the multivariate GGD, we derive a new inequality for rank-constrained SCMs. Also, we propose new acceleration methods to accomplish much faster extraction than conventional blind source separation methods. In BSE experiments using simulated and real recorded data, we confirm that the proposed method achieves more accurate and faster speech extraction than conventional methods. |
| Author | Takamune, Norihiro Kubo, Yuki Kitamura, Daichi Saruwatari, Hiroshi |
| Author_xml | – sequence: 1 givenname: Yuki orcidid: 0000-0002-6877-0605 surname: Kubo fullname: Kubo, Yuki email: yuuki.initial.yk@gmail.com organization: University of Tokyo, Tokyo, Japan – sequence: 2 givenname: Norihiro surname: Takamune fullname: Takamune, Norihiro email: norihiro_takamune@ipc.i.u-tokyo.ac.jp organization: University of Tokyo, Tokyo, Japan – sequence: 3 givenname: Daichi orcidid: 0000-0003-1117-7939 surname: Kitamura fullname: Kitamura, Daichi email: kitamura-d@t.kagawa-nct.ac.jp organization: National Institute of Technology, Kagawa College, Kagawa, Japan – sequence: 4 givenname: Hiroshi orcidid: 0000-0003-0876-5617 surname: Saruwatari fullname: Saruwatari, Hiroshi email: hiroshi_saruwatari@ipc.i.u-tokyo.ac.jp organization: University of Tokyo, Tokyo, Japan |
| BookMark | eNp9kMtu2zAQRYkiAZrXD7QbAlnL4cuUuLQd1y3gIEHsokuBokYwHYVySapw-wn96lB20kUXWXEA3jODe87RiescIPSJkhGlRN2sJ6vlw4gRRkacEE7l-AM6Y5ypTHEiTt5mpshHdBXClhBCSa5ULs7Q32lrXY1XOwCzwfN99NpE2zk81QFqnIZH7Z6yWedC-rIOhqyOVrd41v3S3mpnAN_p6O0ez0O0z_qA_7Bxg-_6NtpDKAJegAOvW_snrVjoPoSE4lub1tqqH5hLdNroNsDV63uBvn-Zr2dfs-X94ttssswM5ypmYHIwGlhuoCGGjRtFmZBNLpQUdaVTL6kkraksZEElKKorpaBmFRWMN5XhF-j6uHfnu589hFhuu967dLJkgiqhRDGWKcWOKeO7EDw05c6ncv53SUk5aC8P2stBe_mqPUHFf5Cx8SBkkNe-j34-ohYA_t1K3fKCUP4CEgGUrQ |
| CODEN | ITASD8 |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2025_3569590 crossref_primary_10_3390_jimaging9090179 crossref_primary_10_1007_s11042_023_16480_w crossref_primary_10_1016_j_apacoust_2025_111019 crossref_primary_10_1049_pel2_12370 crossref_primary_10_1109_TASLP_2024_3407676 |
| Cites_doi | 10.1038/44565 10.1109/CISS.2015.7086828 10.1016/j.sigpro.2014.05.022 10.1109/TASLP.2016.2577880 10.1162/neco.2008.04-08-771 10.5281/zenodo.1227121 10.1587/transfun.E102.A.458 10.1109/ICASSP.2018.8462642 10.1109/TASSP.1984.1164453 10.1109/TASL.2006.885248 10.1109/TASL.2010.2050716 10.1002/9780470487068.ch9 10.1016/S0925-2312(00)00345-3 10.1109/TASL.2006.872618 10.1162/NECO_a_00168 10.1109/TSA.2005.858005 10.1109/TSA.2005.855832 10.1109/TSA.2003.809193 10.2307/1390613 10.23919/EUSIPCO.2019.8903026 10.1007/11679363_75 10.1007/978-3-319-73031-8_6 10.1109/ICASSP.2019.8682291 10.1109/CAMSAP.2017.8313107 10.1186/s13634-018-0549-5 10.1109/TASL.2013.2239990 10.1250/ast.20.199 10.23919/APSIPA.2018.8659577 10.1017/ATSIP.2019.5 10.1109/ISCAS.2009.5118303 10.1016/S0925-2312(98)00047-2 10.23919/EUSIPCO.2019.8902557 10.1109/TSP.2018.2887185 10.1109/TASL.2010.2091636 10.1016/S0165-1684(01)00128-1 10.1109/HSCMA.2011.5942397 10.1109/TASL.2009.2031510 10.1155/S1110865703305074 10.1109/TASL.2008.2011517 10.1109/TASLP.2014.2303576 10.1109/APSIPAASC47483.2019.9023281 10.1109/ASPAA.2011.6082320 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TASLP.2020.3003165 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2329-9304 |
| EndPage | 1963 |
| ExternalDocumentID | 10_1109_TASLP_2020_3003165 9127801 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: JSPS KAKENHI grantid: 17H06101; 19H01116; 19H04131; 19K20306 – fundername: SECOM Science and Technology Foundation funderid: 10.13039/501100004298 – fundername: JSPS-CAS Joint Research Program grantid: JPJSBP120197203 |
| GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AAKMM AALFJ AARMG AASAJ AAWTH AAWTV ABAZT ABQJQ ABVLG ACIWK ACM ADBCU AEBYY AEFXT AEJOY AENSD AFWIH AFWXC AGQYO AGSQL AHBIQ AIKLT AKJIK AKQYR AKRVB ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF EBS EJD ESBDL GUFHI HGAVV IFIPE IPLJI JAVBF LHSKQ M43 OCL PQQKQ RIA RIE RNS ROL AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c339t-ec7ecae27cef0c25f91246f74964dba7996961d1686816e91ab99ed2b1423fbc3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 14 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000545417500005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2329-9290 |
| IngestDate | Sun Nov 09 08:28:44 EST 2025 Tue Nov 18 22:16:38 EST 2025 Sat Nov 29 02:43:53 EST 2025 Wed Aug 27 02:32:41 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0/legalcode |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c339t-ec7ecae27cef0c25f91246f74964dba7996961d1686816e91ab99ed2b1423fbc3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-1117-7939 0000-0003-0876-5617 0000-0002-6877-0605 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/9127801 |
| PQID | 2419494856 |
| PQPubID | 85426 |
| PageCount | 16 |
| ParticipantIDs | proquest_journals_2419494856 crossref_citationtrail_10_1109_TASLP_2020_3003165 crossref_primary_10_1109_TASLP_2020_3003165 ieee_primary_9127801 |
| PublicationCentury | 2000 |
| PublicationDate | 20200000 2020-00-00 20200101 |
| PublicationDateYYYYMMDD | 2020-01-01 |
| PublicationDate_xml | – year: 2020 text: 20200000 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE/ACM transactions on audio, speech, and language processing |
| PublicationTitleAbbrev | TASLP |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref13 ref34 ref12 ref37 fujihara (ref43) 0 ref36 ref31 ref30 ref33 ref32 ref10 ref2 ref1 ono (ref25) 2012 ref39 ref17 ref38 ref16 ref19 ref18 lee (ref15) 2000 lee (ref14) 1999; 401 ref46 ref24 ref45 kulis (ref47) 2009; 10 ref23 ref26 ref20 ref42 ref41 ref22 ref44 ref21 ref28 ref27 ikeshita (ref11) 0 ref29 ref8 ref7 hiroe (ref5) 2006 ref9 ref4 ref3 ref6 ref40 |
| References_xml | – volume: 401 start-page: 788 year: 1999 ident: ref14 article-title: Learning the parts of objects by non-negative matrix factorization publication-title: Nature doi: 10.1038/44565 – ident: ref27 doi: 10.1109/CISS.2015.7086828 – ident: ref26 doi: 10.1016/j.sigpro.2014.05.022 – ident: ref8 doi: 10.1109/TASLP.2016.2577880 – ident: ref32 doi: 10.1162/neco.2008.04-08-771 – start-page: 741 year: 0 ident: ref11 article-title: Inpendent low-rank matrix analysis based on multivariate complex exponential power distribution publication-title: Proc IEEE Int Conf Acoust Speech Signal Process – start-page: 556 year: 2000 ident: ref15 article-title: Algorithms for non-negative matrix factorization publication-title: Proc Neural Inf Process Syst – ident: ref45 doi: 10.5281/zenodo.1227121 – ident: ref13 doi: 10.1587/transfun.E102.A.458 – ident: ref24 doi: 10.1109/ICASSP.2018.8462642 – ident: ref42 doi: 10.1109/TASSP.1984.1164453 – ident: ref38 doi: 10.1109/TASL.2006.885248 – ident: ref21 doi: 10.1109/TASL.2010.2050716 – ident: ref35 doi: 10.1002/9780470487068.ch9 – ident: ref37 doi: 10.1016/S0925-2312(00)00345-3 – ident: ref6 doi: 10.1109/TASL.2006.872618 – ident: ref31 doi: 10.1162/NECO_a_00168 – ident: ref33 doi: 10.1109/TSA.2005.858005 – ident: ref4 doi: 10.1109/TSA.2005.855832 – ident: ref3 doi: 10.1109/TSA.2003.809193 – ident: ref30 doi: 10.2307/1390613 – ident: ref28 doi: 10.23919/EUSIPCO.2019.8903026 – year: 0 ident: ref43 article-title: Performance improvement of higher-order ICA using learning period detection based on closed-form second-order ICA and kurtosis publication-title: Proc Int Workshop Acoust Echo Noise Control – start-page: 601 year: 2006 ident: ref5 article-title: Solution of permutation problem in frequency domain ICA using multivariate probability density functions publication-title: Proc Int Conf Independent Compon Analysis Blind Signal Separation doi: 10.1007/11679363_75 – ident: ref9 doi: 10.1007/978-3-319-73031-8_6 – ident: ref22 doi: 10.1109/ICASSP.2019.8682291 – ident: ref39 doi: 10.1109/CAMSAP.2017.8313107 – ident: ref10 doi: 10.1186/s13634-018-0549-5 – ident: ref19 doi: 10.1109/TASL.2013.2239990 – ident: ref44 doi: 10.1250/ast.20.199 – ident: ref12 doi: 10.23919/APSIPA.2018.8659577 – ident: ref1 doi: 10.1017/ATSIP.2019.5 – ident: ref36 doi: 10.1109/ISCAS.2009.5118303 – ident: ref2 doi: 10.1016/S0925-2312(98)00047-2 – ident: ref23 doi: 10.23919/EUSIPCO.2019.8902557 – volume: 10 start-page: 341 year: 2009 ident: ref47 article-title: Low-rank Kernel learning with bregman matrix divergences publication-title: J Mach Learn Res – ident: ref34 doi: 10.1109/TSP.2018.2887185 – ident: ref46 doi: 10.1109/TASL.2010.2091636 – ident: ref40 doi: 10.1016/S0165-1684(01)00128-1 – ident: ref41 doi: 10.1109/HSCMA.2011.5942397 – ident: ref18 doi: 10.1109/TASL.2009.2031510 – ident: ref16 doi: 10.1155/S1110865703305074 – ident: ref17 doi: 10.1109/TASL.2008.2011517 – ident: ref20 doi: 10.1109/TASLP.2014.2303576 – ident: ref29 doi: 10.1109/APSIPAASC47483.2019.9023281 – ident: ref7 doi: 10.1109/ASPAA.2011.6082320 – start-page: 1 year: 2012 ident: ref25 article-title: Auxiliary-function-based independent vector analysis with power of vector-norm type weighting functions publication-title: Proc Asia Pacific Signal Inf Process Assoc Annu Summit Conf |
| SSID | ssj0001079974 |
| Score | 2.2146132 |
| Snippet | In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1948 |
| SubjectTerms | Acceleration Background noise Blind speech extraction Computational modeling Computer simulation Covariance matrices Covariance matrix diffuse noise Estimation Gaussian distribution Mathematical models Matrix methods Methods Multivariate analysis multivariate complex generalized Gaussian distribution Normal distribution Optimization Parameter estimation Signal processing spatial covariance matrix Speech processing Speech recognition Statistical analysis |
| Title | Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution |
| URI | https://ieeexplore.ieee.org/document/9127801 https://www.proquest.com/docview/2419494856 |
| Volume | 28 |
| WOSCitedRecordID | wos000545417500005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2329-9304 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001079974 issn: 2329-9290 databaseCode: RIE dateStart: 20140101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwELUW1AMcgJZWLB-VD9wgJV9rZ46wLO2BIlQWyi1K7LFYgXbRbhYhfgK_mhknu6zUqlJvkTK24jzbM57kvRFi36HSGjEKrDNpkBYF0pJyNigj7ACH2M7To2_O9cVFdnsLly1xOOfCIKL_-Qy_8aX_lm9HZsqpsiOIYp0xWWtJa1Vztd7zKaEG8KLLFCNAQF4_nHFkQjjqH1-dX9JpMKZDKs9j9iULfsgXVvljN_Yu5mz9_x5uQ6w1oaQ8rrH_KFo4_CRWFwQGN8XrCUWRVl49Ipo72XuuxjWPQZ6Q87KSLn4Vw_uAq3b6WhHItoQVddsdPdExmueE_Mky_s-yR7tBTXSUvwfVnfTcXW9UoWzkqwcv1MX3YjphbqY8ZVXepqDWZ3F91ut3fwRN9YXAJAlUARqNpsBYG3ShiTuOBpgqp1NQqS0LetsKVGQjlaksUghRUQKgjTmrlLjSJF_E8nA0xC0hEXSo6LYtM2qeQZmWzmmddFSpOtombRHNsMhNI03Oo37I_RElhNzjlzN-eYNfWxzM2zzWwhz_tN5kxOaWDVhtsTuDPG_W7iSnmAa8aI7a_nurHbHCfdeJmF2xXI2nuCc-mKdqMBl_9dPyDd3u4gg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4hWqnlUNrSqksp9YEbTcnTjo88Fqi6rFDZttyixB6LFWgX7WYR6k_gVzPjZLdIVJW4WcrYifPZnkfyzQBsOZRKIUaBdSYN0rJE2lLOBlWEmWYT23l69K-e6vfz83N9ugRfFlwYRPQ_n-FXbvpv-XZsZhwq29FRrHImaz3L0jQOG7bW34hKqLT2aZfJStAB6f1wzpIJ9c5g96x3Sv5gTG4qr2TWJg80kS-t8ug89krmcPVpj_caXrXGpNht0H8DSzh6CysPUgyuwd0e2ZFWnF0jmgvRva0nDZNB7JH6soIaP8rRZcB1O321CGRZQouG3R_fkCPNq0KccCL_W9Gl86ChOorfw_pCePauF6pRtAmsh39oiKNyNmV2pjjgvLxtSa138POwO9g_Dtr6C4FJEl0HaBSaEmNl0IUmzhxNMJVOpVqmtirpbUstIxvJXOaRRB2VldZoY44rJa4yyXtYHo1H-AEEahVKumyrnLrnukor55RKMlnJTNmkA9Eci8K0ycl51leFd1JCXXj8CsavaPHrwPaiz3WTmuO_0muM2EKyBasDG3PIi3b3TguyarRPmyPX_93rM7w4Hpz0it63_veP8JLv04RlNmC5nszwEzw3N_VwOtn0S_Qew7vlTw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Blind+Speech+Extraction+Based+on+Rank-Constrained+Spatial+Covariance+Matrix+Estimation+With+Multivariate+Generalized+Gaussian+Distribution&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Kubo%2C+Yuki&rft.au=Takamune%2C+Norihiro&rft.au=Kitamura%2C+Daichi&rft.au=Saruwatari%2C+Hiroshi&rft.date=2020&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=28&rft.spage=1948&rft.epage=1963&rft_id=info:doi/10.1109%2FTASLP.2020.3003165&rft.externalDocID=9127801 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon |