Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution

In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accura...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM transactions on audio, speech, and language processing Ročník 28; s. 1948 - 1963
Hlavní autoři: Kubo, Yuki, Takamune, Norihiro, Kitamura, Daichi, Saruwatari, Hiroshi
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2329-9290, 2329-9304
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accurate BSE than ILRMA, which assumes each source to be a point source (rank-1 spatial model), the proposed method restores the lost spatial basis for the full-rank SCM of diffuse noise. We adopt the multivariate complex generalized Gaussian distribution (GGD) as the statistical generative model to express various types of observed signal. To estimate the model parameters for an arbitrary shape parameter of the multivariate GGD, we derive a new inequality for rank-constrained SCMs. Also, we propose new acceleration methods to accomplish much faster extraction than conventional blind source separation methods. In BSE experiments using simulated and real recorded data, we confirm that the proposed method achieves more accurate and faster speech extraction than conventional methods.
AbstractList In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining independent low-rank matrix analysis (ILRMA) and efficient rank-constrained spatial covariance matrix (SCM) estimation. To achieve more accurate BSE than ILRMA, which assumes each source to be a point source (rank-1 spatial model), the proposed method restores the lost spatial basis for the full-rank SCM of diffuse noise. We adopt the multivariate complex generalized Gaussian distribution (GGD) as the statistical generative model to express various types of observed signal. To estimate the model parameters for an arbitrary shape parameter of the multivariate GGD, we derive a new inequality for rank-constrained SCMs. Also, we propose new acceleration methods to accomplish much faster extraction than conventional blind source separation methods. In BSE experiments using simulated and real recorded data, we confirm that the proposed method achieves more accurate and faster speech extraction than conventional methods.
Author Takamune, Norihiro
Kubo, Yuki
Kitamura, Daichi
Saruwatari, Hiroshi
Author_xml – sequence: 1
  givenname: Yuki
  orcidid: 0000-0002-6877-0605
  surname: Kubo
  fullname: Kubo, Yuki
  email: yuuki.initial.yk@gmail.com
  organization: University of Tokyo, Tokyo, Japan
– sequence: 2
  givenname: Norihiro
  surname: Takamune
  fullname: Takamune, Norihiro
  email: norihiro_takamune@ipc.i.u-tokyo.ac.jp
  organization: University of Tokyo, Tokyo, Japan
– sequence: 3
  givenname: Daichi
  orcidid: 0000-0003-1117-7939
  surname: Kitamura
  fullname: Kitamura, Daichi
  email: kitamura-d@t.kagawa-nct.ac.jp
  organization: National Institute of Technology, Kagawa College, Kagawa, Japan
– sequence: 4
  givenname: Hiroshi
  orcidid: 0000-0003-0876-5617
  surname: Saruwatari
  fullname: Saruwatari, Hiroshi
  email: hiroshi_saruwatari@ipc.i.u-tokyo.ac.jp
  organization: University of Tokyo, Tokyo, Japan
BookMark eNp9kMtu2zAQRYkiAZrXD7QbAlnL4cuUuLQd1y3gIEHsokuBokYwHYVySapw-wn96lB20kUXWXEA3jODe87RiescIPSJkhGlRN2sJ6vlw4gRRkacEE7l-AM6Y5ypTHEiTt5mpshHdBXClhBCSa5ULs7Q32lrXY1XOwCzwfN99NpE2zk81QFqnIZH7Z6yWedC-rIOhqyOVrd41v3S3mpnAN_p6O0ez0O0z_qA_7Bxg-_6NtpDKAJegAOvW_snrVjoPoSE4lub1tqqH5hLdNroNsDV63uBvn-Zr2dfs-X94ttssswM5ypmYHIwGlhuoCGGjRtFmZBNLpQUdaVTL6kkraksZEElKKorpaBmFRWMN5XhF-j6uHfnu589hFhuu967dLJkgiqhRDGWKcWOKeO7EDw05c6ncv53SUk5aC8P2stBe_mqPUHFf5Cx8SBkkNe-j34-ohYA_t1K3fKCUP4CEgGUrQ
CODEN ITASD8
CitedBy_id crossref_primary_10_1109_ACCESS_2025_3569590
crossref_primary_10_3390_jimaging9090179
crossref_primary_10_1007_s11042_023_16480_w
crossref_primary_10_1016_j_apacoust_2025_111019
crossref_primary_10_1049_pel2_12370
crossref_primary_10_1109_TASLP_2024_3407676
Cites_doi 10.1038/44565
10.1109/CISS.2015.7086828
10.1016/j.sigpro.2014.05.022
10.1109/TASLP.2016.2577880
10.1162/neco.2008.04-08-771
10.5281/zenodo.1227121
10.1587/transfun.E102.A.458
10.1109/ICASSP.2018.8462642
10.1109/TASSP.1984.1164453
10.1109/TASL.2006.885248
10.1109/TASL.2010.2050716
10.1002/9780470487068.ch9
10.1016/S0925-2312(00)00345-3
10.1109/TASL.2006.872618
10.1162/NECO_a_00168
10.1109/TSA.2005.858005
10.1109/TSA.2005.855832
10.1109/TSA.2003.809193
10.2307/1390613
10.23919/EUSIPCO.2019.8903026
10.1007/11679363_75
10.1007/978-3-319-73031-8_6
10.1109/ICASSP.2019.8682291
10.1109/CAMSAP.2017.8313107
10.1186/s13634-018-0549-5
10.1109/TASL.2013.2239990
10.1250/ast.20.199
10.23919/APSIPA.2018.8659577
10.1017/ATSIP.2019.5
10.1109/ISCAS.2009.5118303
10.1016/S0925-2312(98)00047-2
10.23919/EUSIPCO.2019.8902557
10.1109/TSP.2018.2887185
10.1109/TASL.2010.2091636
10.1016/S0165-1684(01)00128-1
10.1109/HSCMA.2011.5942397
10.1109/TASL.2009.2031510
10.1155/S1110865703305074
10.1109/TASL.2008.2011517
10.1109/TASLP.2014.2303576
10.1109/APSIPAASC47483.2019.9023281
10.1109/ASPAA.2011.6082320
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TASLP.2020.3003165
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2329-9304
EndPage 1963
ExternalDocumentID 10_1109_TASLP_2020_3003165
9127801
Genre orig-research
GrantInformation_xml – fundername: JSPS KAKENHI
  grantid: 17H06101; 19H01116; 19H04131; 19K20306
– fundername: SECOM Science and Technology Foundation
  funderid: 10.13039/501100004298
– fundername: JSPS-CAS Joint Research Program
  grantid: JPJSBP120197203
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AAKMM
AALFJ
AARMG
AASAJ
AAWTH
AAWTV
ABAZT
ABQJQ
ABVLG
ACIWK
ACM
ADBCU
AEBYY
AEFXT
AEJOY
AENSD
AFWIH
AFWXC
AGQYO
AGSQL
AHBIQ
AIKLT
AKJIK
AKQYR
AKRVB
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CCLIF
EBS
EJD
ESBDL
GUFHI
HGAVV
IFIPE
IPLJI
JAVBF
LHSKQ
M43
OCL
PQQKQ
RIA
RIE
RNS
ROL
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c339t-ec7ecae27cef0c25f91246f74964dba7996961d1686816e91ab99ed2b1423fbc3
IEDL.DBID RIE
ISICitedReferencesCount 14
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000545417500005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2329-9290
IngestDate Sun Nov 09 08:28:44 EST 2025
Tue Nov 18 22:16:38 EST 2025
Sat Nov 29 02:43:53 EST 2025
Wed Aug 27 02:32:41 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c339t-ec7ecae27cef0c25f91246f74964dba7996961d1686816e91ab99ed2b1423fbc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-1117-7939
0000-0003-0876-5617
0000-0002-6877-0605
OpenAccessLink https://ieeexplore.ieee.org/document/9127801
PQID 2419494856
PQPubID 85426
PageCount 16
ParticipantIDs proquest_journals_2419494856
crossref_citationtrail_10_1109_TASLP_2020_3003165
crossref_primary_10_1109_TASLP_2020_3003165
ieee_primary_9127801
PublicationCentury 2000
PublicationDate 20200000
2020-00-00
20200101
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 20200000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev TASLP
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref34
ref12
ref37
fujihara (ref43) 0
ref36
ref31
ref30
ref33
ref32
ref10
ref2
ref1
ono (ref25) 2012
ref39
ref17
ref38
ref16
ref19
ref18
lee (ref15) 2000
lee (ref14) 1999; 401
ref46
ref24
ref45
kulis (ref47) 2009; 10
ref23
ref26
ref20
ref42
ref41
ref22
ref44
ref21
ref28
ref27
ikeshita (ref11) 0
ref29
ref8
ref7
hiroe (ref5) 2006
ref9
ref4
ref3
ref6
ref40
References_xml – volume: 401
  start-page: 788
  year: 1999
  ident: ref14
  article-title: Learning the parts of objects by non-negative matrix factorization
  publication-title: Nature
  doi: 10.1038/44565
– ident: ref27
  doi: 10.1109/CISS.2015.7086828
– ident: ref26
  doi: 10.1016/j.sigpro.2014.05.022
– ident: ref8
  doi: 10.1109/TASLP.2016.2577880
– ident: ref32
  doi: 10.1162/neco.2008.04-08-771
– start-page: 741
  year: 0
  ident: ref11
  article-title: Inpendent low-rank matrix analysis based on multivariate complex exponential power distribution
  publication-title: Proc IEEE Int Conf Acoust Speech Signal Process
– start-page: 556
  year: 2000
  ident: ref15
  article-title: Algorithms for non-negative matrix factorization
  publication-title: Proc Neural Inf Process Syst
– ident: ref45
  doi: 10.5281/zenodo.1227121
– ident: ref13
  doi: 10.1587/transfun.E102.A.458
– ident: ref24
  doi: 10.1109/ICASSP.2018.8462642
– ident: ref42
  doi: 10.1109/TASSP.1984.1164453
– ident: ref38
  doi: 10.1109/TASL.2006.885248
– ident: ref21
  doi: 10.1109/TASL.2010.2050716
– ident: ref35
  doi: 10.1002/9780470487068.ch9
– ident: ref37
  doi: 10.1016/S0925-2312(00)00345-3
– ident: ref6
  doi: 10.1109/TASL.2006.872618
– ident: ref31
  doi: 10.1162/NECO_a_00168
– ident: ref33
  doi: 10.1109/TSA.2005.858005
– ident: ref4
  doi: 10.1109/TSA.2005.855832
– ident: ref3
  doi: 10.1109/TSA.2003.809193
– ident: ref30
  doi: 10.2307/1390613
– ident: ref28
  doi: 10.23919/EUSIPCO.2019.8903026
– year: 0
  ident: ref43
  article-title: Performance improvement of higher-order ICA using learning period detection based on closed-form second-order ICA and kurtosis
  publication-title: Proc Int Workshop Acoust Echo Noise Control
– start-page: 601
  year: 2006
  ident: ref5
  article-title: Solution of permutation problem in frequency domain ICA using multivariate probability density functions
  publication-title: Proc Int Conf Independent Compon Analysis Blind Signal Separation
  doi: 10.1007/11679363_75
– ident: ref9
  doi: 10.1007/978-3-319-73031-8_6
– ident: ref22
  doi: 10.1109/ICASSP.2019.8682291
– ident: ref39
  doi: 10.1109/CAMSAP.2017.8313107
– ident: ref10
  doi: 10.1186/s13634-018-0549-5
– ident: ref19
  doi: 10.1109/TASL.2013.2239990
– ident: ref44
  doi: 10.1250/ast.20.199
– ident: ref12
  doi: 10.23919/APSIPA.2018.8659577
– ident: ref1
  doi: 10.1017/ATSIP.2019.5
– ident: ref36
  doi: 10.1109/ISCAS.2009.5118303
– ident: ref2
  doi: 10.1016/S0925-2312(98)00047-2
– ident: ref23
  doi: 10.23919/EUSIPCO.2019.8902557
– volume: 10
  start-page: 341
  year: 2009
  ident: ref47
  article-title: Low-rank Kernel learning with bregman matrix divergences
  publication-title: J Mach Learn Res
– ident: ref34
  doi: 10.1109/TSP.2018.2887185
– ident: ref46
  doi: 10.1109/TASL.2010.2091636
– ident: ref40
  doi: 10.1016/S0165-1684(01)00128-1
– ident: ref41
  doi: 10.1109/HSCMA.2011.5942397
– ident: ref18
  doi: 10.1109/TASL.2009.2031510
– ident: ref16
  doi: 10.1155/S1110865703305074
– ident: ref17
  doi: 10.1109/TASL.2008.2011517
– ident: ref20
  doi: 10.1109/TASLP.2014.2303576
– ident: ref29
  doi: 10.1109/APSIPAASC47483.2019.9023281
– ident: ref7
  doi: 10.1109/ASPAA.2011.6082320
– start-page: 1
  year: 2012
  ident: ref25
  article-title: Auxiliary-function-based independent vector analysis with power of vector-norm type weighting functions
  publication-title: Proc Asia Pacific Signal Inf Process Assoc Annu Summit Conf
SSID ssj0001079974
Score 2.2146132
Snippet In this article, we propose a new blind speech extraction (BSE) method that robustly extracts a directional speech from background diffuse noise by combining...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1948
SubjectTerms Acceleration
Background noise
Blind speech extraction
Computational modeling
Computer simulation
Covariance matrices
Covariance matrix
diffuse noise
Estimation
Gaussian distribution
Mathematical models
Matrix methods
Methods
Multivariate analysis
multivariate complex generalized Gaussian distribution
Normal distribution
Optimization
Parameter estimation
Signal processing
spatial covariance matrix
Speech processing
Speech recognition
Statistical analysis
Title Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution
URI https://ieeexplore.ieee.org/document/9127801
https://www.proquest.com/docview/2419494856
Volume 28
WOSCitedRecordID wos000545417500005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2329-9304
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001079974
  issn: 2329-9290
  databaseCode: RIE
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwELUW1AMcgJZWLB-VD9wgJV9rZ46wLO2BIlQWyi1K7LFYgXbRbhYhfgK_mhknu6zUqlJvkTK24jzbM57kvRFi36HSGjEKrDNpkBYF0pJyNigj7ACH2M7To2_O9cVFdnsLly1xOOfCIKL_-Qy_8aX_lm9HZsqpsiOIYp0xWWtJa1Vztd7zKaEG8KLLFCNAQF4_nHFkQjjqH1-dX9JpMKZDKs9j9iULfsgXVvljN_Yu5mz9_x5uQ6w1oaQ8rrH_KFo4_CRWFwQGN8XrCUWRVl49Ipo72XuuxjWPQZ6Q87KSLn4Vw_uAq3b6WhHItoQVddsdPdExmueE_Mky_s-yR7tBTXSUvwfVnfTcXW9UoWzkqwcv1MX3YjphbqY8ZVXepqDWZ3F91ut3fwRN9YXAJAlUARqNpsBYG3ShiTuOBpgqp1NQqS0LetsKVGQjlaksUghRUQKgjTmrlLjSJF_E8nA0xC0hEXSo6LYtM2qeQZmWzmmddFSpOtombRHNsMhNI03Oo37I_RElhNzjlzN-eYNfWxzM2zzWwhz_tN5kxOaWDVhtsTuDPG_W7iSnmAa8aI7a_nurHbHCfdeJmF2xXI2nuCc-mKdqMBl_9dPyDd3u4gg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4hWqnlUNrSqksp9YEbTcnTjo88Fqi6rFDZttyixB6LFWgX7WYR6k_gVzPjZLdIVJW4WcrYifPZnkfyzQBsOZRKIUaBdSYN0rJE2lLOBlWEmWYT23l69K-e6vfz83N9ugRfFlwYRPQ_n-FXbvpv-XZsZhwq29FRrHImaz3L0jQOG7bW34hKqLT2aZfJStAB6f1wzpIJ9c5g96x3Sv5gTG4qr2TWJg80kS-t8ug89krmcPVpj_caXrXGpNht0H8DSzh6CysPUgyuwd0e2ZFWnF0jmgvRva0nDZNB7JH6soIaP8rRZcB1O321CGRZQouG3R_fkCPNq0KccCL_W9Gl86ChOorfw_pCePauF6pRtAmsh39oiKNyNmV2pjjgvLxtSa138POwO9g_Dtr6C4FJEl0HaBSaEmNl0IUmzhxNMJVOpVqmtirpbUstIxvJXOaRRB2VldZoY44rJa4yyXtYHo1H-AEEahVKumyrnLrnukor55RKMlnJTNmkA9Eci8K0ycl51leFd1JCXXj8CsavaPHrwPaiz3WTmuO_0muM2EKyBasDG3PIi3b3TguyarRPmyPX_93rM7w4Hpz0it63_veP8JLv04RlNmC5nszwEzw3N_VwOtn0S_Qew7vlTw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Blind+Speech+Extraction+Based+on+Rank-Constrained+Spatial+Covariance+Matrix+Estimation+With+Multivariate+Generalized+Gaussian+Distribution&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Kubo%2C+Yuki&rft.au=Takamune%2C+Norihiro&rft.au=Kitamura%2C+Daichi&rft.au=Saruwatari%2C+Hiroshi&rft.date=2020&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=28&rft.spage=1948&rft.epage=1963&rft_id=info:doi/10.1109%2FTASLP.2020.3003165&rft.externalDocID=9127801
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon