Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions

This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In t...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) s. 541 - 545
Hlavní autoři:	Leglaive, Simon, Simsekli, Umut, Liutkus, Antoine, Girin, Laurent, Horaud, Radu
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.05.2019
Témata:	alpha-stable distribution Context modeling Deep learning Expectation-maximization algorithms Monte Carlo expectation-maximization Monte Carlo methods Noise measurement Speech enhancement Time-frequency analysis variational autoencoders
ISSN:	2379-190X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In this context, our contribution is to propose a noise model based on alpha-stable distributions, instead of the more conventional Gaussian non-negative matrix factorization approach found in previous studies. We develop a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time. Experimental results show the superiority of the proposed approach both in terms of perceptual quality and intelligibility of the enhanced speech signal.
AbstractList	This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In this context, our contribution is to propose a noise model based on alpha-stable distributions, instead of the more conventional Gaussian non-negative matrix factorization approach found in previous studies. We develop a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time. Experimental results show the superiority of the proposed approach both in terms of perceptual quality and intelligibility of the enhanced speech signal.
Author	Liutkus, Antoine Horaud, Radu Leglaive, Simon Simsekli, Umut Girin, Laurent
Author_xml	– sequence: 1 givenname: Simon surname: Leglaive fullname: Leglaive, Simon organization: Inria Grenoble Rhône-Alpes, France – sequence: 2 givenname: Umut surname: Simsekli fullname: Simsekli, Umut organization: LTCI, Télécom ParisTech, Université Paris-Saclay, France – sequence: 3 givenname: Antoine surname: Liutkus fullname: Liutkus, Antoine organization: Inria and LIRMM, France – sequence: 4 givenname: Laurent surname: Girin fullname: Girin, Laurent organization: Inria Grenoble Rhône-Alpes, France – sequence: 5 givenname: Radu surname: Horaud fullname: Horaud, Radu organization: Inria Grenoble Rhône-Alpes, France
BookMark	eNotj11LwzAYhaMouM39gt3kD3TmbdN8XI45pzCYMBXvRtq8oZEuLU2G-O-tuKsDh8PDeabkJnQBCVkAWwIw_fCyXh0Or8ucgV4qofKSiysy11IBl1orKACuySQvpM5As887Mo3xizGmJFcTsj_0iHVDN6ExocYThkS_fWrohxm8Sb4LpqWrc-ow1J3FIVITLF21fWOymEzVIn30MQ2-Ov-N4z25daaNOL_kjLw_bd7Wz9luvx2f7rImZzpl2tWiFBVqqFzuwBSWGylcqRG5lLWTnMGoooUwgLyoK2UrbrG0zo29ssWMLP65HhGP_eBPZvg5XvyLX2TeUsE
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP.2019.8682546
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9781479981311 1479981311
EISSN	2379-190X
EndPage	545
ExternalDocumentID	8682546
Genre	orig-research
GroupedDBID	23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-h209t-9fc656be91bf2f1a3d4a76f59ee477cf7401825966a1e43cb8db4de5dff1828d3
IEDL.DBID	RIE
ISICitedReferencesCount	27
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000482554000109&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 05:56:09 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-h209t-9fc656be91bf2f1a3d4a76f59ee477cf7401825966a1e43cb8db4de5dff1828d3
OpenAccessLink	https://inria.hal.science/hal-02005106
PageCount	5
ParticipantIDs	ieee_primary_8682546
PublicationCentury	2000
PublicationDate	2019-May
PublicationDateYYYYMMDD	2019-05-01
PublicationDate_xml	– month: 05 year: 2019 text: 2019-May
PublicationDecade	2010
PublicationTitle	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev	ICASSP
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.3167892
Snippet	This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of...
SourceID	ieee
SourceType	Publisher
StartPage	541
SubjectTerms	alpha-stable distribution Context modeling Deep learning Expectation-maximization algorithms Monte Carlo expectation-maximization Monte Carlo methods Noise measurement Speech enhancement Time-frequency analysis variational autoencoders
Title	Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions
URI	https://ieeexplore.ieee.org/document/8682546
WOSCitedRecordID	wos000482554000109&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4kEvPlrxTQ4ejd3dZjfJsdQWBamFaumtbJIJFWS3tFt_v8nuUhW8eAsDeTATksxkvm8AbpnhwsZpSCN0ZmAqQZoKGVBrY2mMZkyW5YBmz3w8FvO5nDTgboeFQcQy-QzvfbP8yze53vpQWVckJX17E5qcJxVWa3fqCs5EzSoUBrL7NOhPpxOfuuX2QtXtV_2U8voYHf5v4iPofOPwyGR3wxxDA7MTOPhBIdiGl-kKUS_JMFt6-_lxiI-tkplzgutAH-lvi9wTVvqkZZJmhvQ9wpa6h6H6QPLguXPrslebDryNhq-DR1oXSaDLKJAFlVa7J5lCGSob2TDtGZbyxGkakXHPOeQcKLd459WkIbKeVsIoZjA21jq5ML1TaGV5hmdA0lijR-qE2vl8NlYqRMudDSMmrIkCdQ5tr5rFquLBWNRaufhbfAn7XvtVcuAVtIr1Fq9hT38W75v1TWm8Lxdpnag
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmqgXP8D4bQ8eXdkuXbY9EoRARCQBCTeybafBxCwEFn-_7e4GNfHirWnSbTNv03am894A3DMdcRPG1AvQwsBkA72YC98zJhRaK8ZEVg5o0o8GAz6dimEJHrZcGETMks_w0TWzt3y9UBsXKqvxRibfvgO7IWOBn7O1tvsujxgvdIWoL2q9VnM0GrrkLfs35AN_VVDJDpDO0f-mPobqNxOPDLdnzAmUMDmFwx8ighV4HS0R1Zy0k7lD0H2HuOgqmVg3uAj1keYmXTjJSpe2TOJEk6bj2Hr2aig_kDw59dyi8NW6Cm-d9rjV9YoyCd488EXqCaPspUyioNIEhsZ1zeKoYW2NyCKnOmRdKLt469fEFFldSa4l0xhqY2w_1_UzKCeLBM-BxKFCx9Whynp9JpSSooksigHjRge-vICKM81smSthzAqrXP7dfQf73fFLf9bvDZ6v4MAhkacKXkM5XW3wBvbUZ_q-Xt1mQH4Bd7yg7w
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Speech+Enhancement+with+Variational+Autoencoders+and+Alpha-stable+Distributions&rft.au=Leglaive%2C+Simon&rft.au=Simsekli%2C+Umut&rft.au=Liutkus%2C+Antoine&rft.au=Girin%2C+Laurent&rft.date=2019-05-01&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=541&rft.epage=545&rft_id=info:doi/10.1109%2FICASSP.2019.8682546&rft.externalDocID=8682546