Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. One advantage of this gen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing Jg. 28; S. 1788 - 1800
Hauptverfasser: Sadeghi, Mostafa, Leglaive, Simon, Alameda-Pineda, Xavier, Girin, Laurent, Horaud, Radu
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Schlagworte:
ISSN:2329-9290, 2329-9304
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. One advantage of this generative approach is that it does not require pairs of clean and noisy speech signals at training. In this article, we propose audio-visual variants of VAEs for single-channel and speaker-independent speech enhancement. We develop a conditional VAE (CVAE) where the audio speech generative process is conditioned on visual information of the lip region. At test time, the audio-visual speech generative model is combined with a noise model based on nonnegative matrix factorization, and speech enhancement relies on a Monte Carlo expectation-maximization algorithm. Experiments are conducted with the recently published NTCD-TIMIT dataset as well as the GRID corpus. The results confirm that the proposed audio-visual CVAE effectively fuses audio and visual information, and it improves the speech enhancement performance compared with the audio-only VAE model, especially when the speech signal is highly corrupted by noise. We also show that the proposed unsupervised audio-visual speech enhancement approach outperforms a state-of-the-art supervised deep learning method.
AbstractList Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. One advantage of this generative approach is that it does not require pairs of clean and noisy speech signals at training. In this article, we propose audio-visual variants of VAEs for single-channel and speaker-independent speech enhancement. We develop a conditional VAE (CVAE) where the audio speech generative process is conditioned on visual information of the lip region. At test time, the audio-visual speech generative model is combined with a noise model based on nonnegative matrix factorization, and speech enhancement relies on a Monte Carlo expectation-maximization algorithm. Experiments are conducted with the recently published NTCD-TIMIT dataset as well as the GRID corpus. The results confirm that the proposed audio-visual CVAE effectively fuses audio and visual information, and it improves the speech enhancement performance compared with the audio-only VAE model, especially when the speech signal is highly corrupted by noise. We also show that the proposed unsupervised audio-visual speech enhancement approach outperforms a state-of-the-art supervised deep learning method.
Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. One advantage of this generative approach is that it does not require pairs of clean and noisy speech signals at training. In this paper, we propose audio-visual variants of VAEs for single-channel and speaker-independent speech enhancement. We develop a conditional VAE (CVAE) where the audio speech generative process is conditioned on visual information of the lip region. At test time, the audio-visual speech generative model is combined with a noise model based on nonnegative matrix factorization, and speech enhancement relies on a Monte Carlo expectation-maximization algorithm. Experiments are conducted with the recently published NTCD-TIMIT dataset. The results confirm that the proposed audio-visual CVAE effectively fuse audio and visual information, and it improves the speech enhancement performance compared with the audio-only VAE model, especially when the speech signal is highly corrupted by noise. We also show that the proposed unsupervised audio-visual speech enhancement approach outperforms a state-of-the-art supervised deep learning method.
Author Horaud, Radu
Leglaive, Simon
Alameda-Pineda, Xavier
Sadeghi, Mostafa
Girin, Laurent
Author_xml – sequence: 1
  givenname: Mostafa
  orcidid: 0000-0002-0272-8017
  surname: Sadeghi
  fullname: Sadeghi, Mostafa
  email: mostafa.sadeghi@inria.fr
  organization: Inria Grenoble Rhône-Alpes, Montbonnot Saint-Martin, France
– sequence: 2
  givenname: Simon
  orcidid: 0000-0002-8219-1298
  surname: Leglaive
  fullname: Leglaive, Simon
  email: simon.leglaive@centralesupelec.fr
  organization: Inria Grenoble Rhône-Alpes, Montbonnot Saint-Martin, France
– sequence: 3
  givenname: Xavier
  orcidid: 0000-0002-5354-1084
  surname: Alameda-Pineda
  fullname: Alameda-Pineda, Xavier
  email: xavier.alameda-pineda@inria.fr
  organization: Inria Grenoble Rhône-Alpes, Montbonnot Saint-Martin, France
– sequence: 4
  givenname: Laurent
  orcidid: 0000-0002-9214-8760
  surname: Girin
  fullname: Girin, Laurent
  email: laurent.girin@grenoble-inp.fr
  organization: Inria Grenoble Rhône-Alpes, Montbonnot Saint-Martin, France
– sequence: 5
  givenname: Radu
  orcidid: 0000-0001-5232-024X
  surname: Horaud
  fullname: Horaud, Radu
  email: radu.horaud@inria.fr
  organization: Inria Grenoble Rhône-Alpes, Montbonnot Saint-Martin, France
BackLink https://inria.hal.science/hal-02364900$$DView record in HAL
BookMark eNp9kMtKAzEUhoMoqNUX0E3BlYupJ5e5ZDmUasWCgrbbkMmcsZGa1GRG8O2dXnThwlUO4fsO__lPyaHzDgm5oDCiFOTNS_k8exoxYDDiAJBKfkBOGGcykRzE4c_MJByT8xjfeoZCLmUuTshD2dXWJwsbO70aPq8RzXI4cUvtDL6ja4fzaN3rcOxdbVvrXQ8tdLB6P5dd65OJM77GEM_IUaNXEc_374DMbycv42kye7y7H5ezxPQ52iQTTQ1YMcoKzI3JqqrWQucZGN5QqLCoWQUZFHmBjEKKOhUsLRqZgayEriUfkOvd3qVeqXWw7zp8Ka-tmpYztfkDxjMhAT55z17t2HXwHx3GVr35LvTRo2KC5oIWnKc9VewoE3yMARtlbLu9sQ3arhQFtWlabZtWm6bVvuleZX_Un0T_Spc7ySLiryB7PM9S_g0DHoot
CODEN ITASD8
CitedBy_id crossref_primary_10_1109_TASLP_2022_3153265
crossref_primary_10_1109_TCYB_2022_3163811
crossref_primary_10_1016_j_csl_2025_101887
crossref_primary_10_1038_s41597_024_02918_9
crossref_primary_10_1016_j_dsp_2022_103897
crossref_primary_10_1007_s10772_023_10018_z
crossref_primary_10_1007_s11277_022_09852_2
crossref_primary_10_1109_TMM_2024_3352388
crossref_primary_10_1109_TSP_2021_3066038
crossref_primary_10_1109_TAI_2022_3220190
crossref_primary_10_1080_01691864_2022_2035253
crossref_primary_10_1145_3696445
crossref_primary_10_1007_s11263_022_01742_1
crossref_primary_10_1155_2021_9979606
crossref_primary_10_1109_TASLP_2022_3207349
crossref_primary_10_1109_TASLP_2021_3095656
crossref_primary_10_1109_TAI_2022_3169995
crossref_primary_10_3390_e24010055
crossref_primary_10_1109_ACCESS_2023_3253719
crossref_primary_10_3390_app13169217
crossref_primary_10_1016_j_eswa_2024_124852
crossref_primary_10_3390_en13174291
crossref_primary_10_1109_TASLP_2021_3066303
crossref_primary_10_1007_s11760_024_03245_7
crossref_primary_10_1109_TASLP_2021_3126925
crossref_primary_10_1016_j_compositesb_2024_111353
crossref_primary_10_1109_MSP_2023_3240008
crossref_primary_10_1007_s11042_022_13302_3
crossref_primary_10_1145_3635706
crossref_primary_10_1250_ast_e24_95
crossref_primary_10_1002_qre_3394
crossref_primary_10_1109_TIA_2021_3065194
crossref_primary_10_1109_TASLP_2023_3265202
Cites_doi 10.1016/S0165-1684(01)00128-1
10.1109/ICASSP.2018.8461326
10.1109/TASL.2013.2270369
10.1109/WASPAA.2019.8937218
10.1121/1.1907309
10.1109/ICASSP.2008.4518538
10.1121/1.1358887
10.1109/TASL.2010.2096212
10.21437/Interspeech.2016-211
10.1109/MLSP.2018.8516711
10.1109/ICASSP.2019.8682623
10.1109/TASL.2007.899233
10.21437/Interspeech.2019-1398
10.1109/TASLP.2014.2364452
10.1162/NECO_a_00168
10.1121/1.2229005
10.3109/03005368709077786
10.21437/Interspeech.2018-1955
10.1121/1.4799597
10.21236/ADA073139
10.1109/TSA.2005.851927
10.1023/A:1007665907178
10.1080/01621459.2017.1285773
10.1109/ICASSP.2013.6638354
10.1109/TASSP.1985.1164550
10.1109/TASSP.1979.1163209
10.21437/Interspeech.2018-2516
10.1109/ICASSP.2019.8683704
10.1007/978-3-319-22482-4_11
10.1109/SAM.2002.1191001
10.1109/ICASSP.2018.8461530
10.1080/01621459.1990.10474930
10.1109/ICASSP.2019.8683497
10.1162/neco.2008.04-08-771
10.23919/APSIPA.2018.8659591
10.1109/TASL.2013.2250961
10.1109/TASL.2011.2114881
10.1109/TETCI.2017.2784878
10.1162/neco_a_01217
10.1109/TSA.2005.858005
10.21437/Interspeech.2018-1400
10.1109/ICASSP.2001.941023
10.1109/ICASSP.2019.8682546
10.1201/9781420015836
10.1109/ICASSP.2011.5946317
10.1109/ICASSP.2019.8682797
10.1109/TASLP.2018.2842159
10.1109/TASSP.1984.1164453
10.1044/jshd.4004.481
10.1111/j.2517-6161.1977.tb01600.x
10.21437/Interspeech.2017-860
10.1109/TASLP.2014.2352935
10.1007/978-3-540-74494-8_52
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
VOOES
DOI 10.1109/TASLP.2020.3000593
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2329-9304
EndPage 1800
ExternalDocumentID oai:HAL:hal-02364900v3
10_1109_TASLP_2020_3000593
9110765
Genre orig-research
GrantInformation_xml – fundername: Multidisciplinary Institute in Artificial Intelligence
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AAKMM
AALFJ
AARMG
AASAJ
AAWTH
AAWTV
ABAZT
ABQJQ
ABVLG
ACIWK
ACM
ADBCU
AEBYY
AEFXT
AEJOY
AENSD
AFWIH
AFWXC
AGQYO
AGSQL
AHBIQ
AIKLT
AKJIK
AKQYR
AKRVB
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CCLIF
EBS
EJD
GUFHI
HGAVV
IFIPE
IPLJI
JAVBF
LHSKQ
M43
OCL
PQQKQ
RIA
RIE
RNS
ROL
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
VOOES
ID FETCH-LOGICAL-c329t-64fd0eb2128e7cc6bbda4a760c3f10be8d2b060878e2105ea54258f9609b4ad93
IEDL.DBID RIE
ISICitedReferencesCount 45
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000543714200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2329-9290
IngestDate Tue Oct 14 20:54:41 EDT 2025
Sun Nov 09 06:47:49 EST 2025
Sat Nov 29 02:43:53 EST 2025
Tue Nov 18 22:38:10 EST 2025
Wed Aug 27 02:38:22 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Training
Speech enhancement
Visualization
Lips
Noise measurement
Computer architecture
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c329t-64fd0eb2128e7cc6bbda4a760c3f10be8d2b060878e2105ea54258f9609b4ad93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-0272-8017
0000-0001-5232-024X
0000-0002-9214-8760
0000-0002-8219-1298
0000-0002-5354-1084
OpenAccessLink https://inria.hal.science/hal-02364900
PQID 2417418335
PQPubID 85426
PageCount 13
ParticipantIDs ieee_primary_9110765
crossref_citationtrail_10_1109_TASLP_2020_3000593
crossref_primary_10_1109_TASLP_2020_3000593
hal_primary_oai_HAL_hal_02364900v3
proquest_journals_2417418335
PublicationCentury 2000
PublicationDate 2020-01-01
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – month: 01
  year: 2020
  text: 2020-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev TASLP
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
– name: Institute of Electrical and Electronics Engineers
References ref13
garofolo (ref60) 1993
ref59
ref15
ref14
ref53
ref52
ref55
ref54
ref10
ref17
ref19
lu (ref41) 0
ref18
benesty (ref2) 2006
ref51
ref50
dempster (ref57) 1977; 39
hirsch (ref61) 2005
fisher iii (ref9) 0
ref46
ref45
ref48
ref42
ref44
ref43
ref49
ref8
ref4
ref3
ref6
ref5
kingma (ref65) 0
ref40
ref35
ref34
lim (ref1) 1983
ref37
goecke (ref11) 0
ref36
ref31
ref30
ref33
ref32
hershey (ref12) 0
ref39
higgins (ref56) 0
sohn (ref25) 0
kingma (ref47) 0
girin (ref7) 0
ref68
ref24
ref67
ref23
ref26
ref69
ref64
ref20
ref63
ref66
ref22
ref21
raj (ref38) 0
robert (ref58) 2005
ref28
ref27
ref29
ref62
gabbay (ref16) 0
References_xml – ident: ref35
  doi: 10.1016/S0165-1684(01)00128-1
– ident: ref55
  doi: 10.1109/ICASSP.2018.8461326
– ident: ref39
  doi: 10.1109/TASL.2013.2270369
– ident: ref26
  doi: 10.1109/WASPAA.2019.8937218
– start-page: ii?2025
  year: 0
  ident: ref11
  article-title: Noisy audio feature enhancement using audio-visual speech data
  publication-title: Proc IEEE Int Conf Acoust Speech Signal Process
– ident: ref4
  doi: 10.1121/1.1907309
– ident: ref37
  doi: 10.1109/ICASSP.2008.4518538
– ident: ref8
  doi: 10.1121/1.1358887
– ident: ref51
  doi: 10.1109/TASL.2010.2096212
– ident: ref43
  doi: 10.21437/Interspeech.2016-211
– ident: ref20
  doi: 10.1109/MLSP.2018.8516711
– ident: ref49
  doi: 10.1109/ICASSP.2019.8682623
– ident: ref33
  doi: 10.1109/TASL.2007.899233
– ident: ref24
  doi: 10.21437/Interspeech.2019-1398
– ident: ref42
  doi: 10.1109/TASLP.2014.2364452
– ident: ref59
  doi: 10.1162/NECO_a_00168
– start-page: 772
  year: 0
  ident: ref9
  article-title: Learning joint statistical models for audio-visual fusion and segregation
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref28
  doi: 10.1121/1.2229005
– ident: ref6
  doi: 10.3109/03005368709077786
– ident: ref15
  doi: 10.21437/Interspeech.2018-1955
– year: 2005
  ident: ref61
  article-title: FaNT-Filtering and noise adding tool
– ident: ref62
  doi: 10.1121/1.4799597
– year: 2006
  ident: ref2
  publication-title: Speech Enhancement
– ident: ref30
  doi: 10.21236/ADA073139
– ident: ref32
  doi: 10.1109/TSA.2005.851927
– year: 1993
  ident: ref60
  article-title: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM
– year: 0
  ident: ref56
  publication-title: Proc Int Conf Learn Representations
– ident: ref53
  doi: 10.1023/A:1007665907178
– ident: ref54
  doi: 10.1080/01621459.2017.1285773
– ident: ref13
  doi: 10.1109/ICASSP.2013.6638354
– ident: ref34
  doi: 10.1109/TASSP.1985.1164550
– ident: ref29
  doi: 10.1109/TASSP.1979.1163209
– ident: ref18
  doi: 10.21437/Interspeech.2018-2516
– start-page: 1173
  year: 0
  ident: ref12
  article-title: Audio-visual sound separation via hidden Markov models
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 1559
  year: 0
  ident: ref7
  article-title: Noisy speech enhancement with filters estimated from the speaker's lips
  publication-title: Proc Eur Conf Speech Commun Technol
– ident: ref22
  doi: 10.1109/ICASSP.2019.8683704
– ident: ref46
  doi: 10.1007/978-3-319-22482-4_11
– ident: ref10
  doi: 10.1109/SAM.2002.1191001
– ident: ref19
  doi: 10.1109/ICASSP.2018.8461530
– start-page: 3581
  year: 0
  ident: ref47
  article-title: Semi-supervised learning with deep generative models
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref52
  doi: 10.1080/01621459.1990.10474930
– year: 0
  ident: ref65
  article-title: Adam: A method for stochastic optimization
  publication-title: Proc 3rd Int Conf Learn Representations
– ident: ref50
  doi: 10.1109/ICASSP.2019.8683497
– ident: ref36
  doi: 10.1162/neco.2008.04-08-771
– start-page: 3483
  year: 0
  ident: ref25
  article-title: Learning structured output representation using deep conditional generative models
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref21
  doi: 10.23919/APSIPA.2018.8659591
– ident: ref44
  doi: 10.1109/TASL.2013.2250961
– ident: ref68
  doi: 10.1109/TASL.2011.2114881
– ident: ref17
  doi: 10.1109/TETCI.2017.2784878
– ident: ref48
  doi: 10.1162/neco_a_01217
– year: 1983
  ident: ref1
  publication-title: Speech Enhancement
– ident: ref66
  doi: 10.1109/TSA.2005.858005
– year: 2005
  ident: ref58
  publication-title: Monte Carlo Statistical Methods
– ident: ref14
  doi: 10.21437/Interspeech.2018-1400
– ident: ref67
  doi: 10.1109/ICASSP.2001.941023
– ident: ref23
  doi: 10.1109/ICASSP.2019.8682546
– ident: ref3
  doi: 10.1201/9781420015836
– ident: ref64
  doi: 10.1109/ICASSP.2011.5946317
– ident: ref69
  doi: 10.1109/ICASSP.2019.8682797
– ident: ref40
  doi: 10.1109/TASLP.2018.2842159
– ident: ref31
  doi: 10.1109/TASSP.1984.1164453
– ident: ref5
  doi: 10.1044/jshd.4004.481
– volume: 39
  start-page: 1
  year: 1977
  ident: ref57
  article-title: Maximum likelihood from incomplete data via the EM algorithm
  publication-title: J Roy Statist Soc Ser B
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– ident: ref27
  doi: 10.21437/Interspeech.2017-860
– ident: ref45
  doi: 10.1109/TASLP.2014.2352935
– start-page: 1217
  year: 0
  ident: ref38
  article-title: Phoneme-dependent NMF for speech enhancement in monaural mixtures
  publication-title: Proc Conf Int Speech Commun Assoc
– start-page: 3051
  year: 0
  ident: ref16
  article-title: Seeing through noise: Speaker separation and enhancement using visually-derived speech
  publication-title: Proc IEEE Int Conf Acoust Speech Signal Process
– start-page: 436
  year: 0
  ident: ref41
  article-title: Speech enhancement based on deep denoising autoencoder
  publication-title: Proc Conf Int Speech Commun Assoc
– ident: ref63
  doi: 10.1007/978-3-540-74494-8_52
SSID ssj0001079974
Score 2.4560304
Snippet Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been...
SourceID hal
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1788
SubjectTerms Algorithms
Audio data
Audio equipment
Audio-visual speech enhancement
Coders
Computer architecture
Computer Science
Computer simulation
Computer Vision and Pattern Recognition
deep generative models
Lips
Machine Learning
Monte Carlo expectation-maximization
Noise measurement
nonnegative matrix factorization
Signal and Image Processing
Sound
Speech
Speech enhancement
Speech processing
Training
variational auto-encoders
Visual signals
Visualization
Title Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
URI https://ieeexplore.ieee.org/document/9110765
https://www.proquest.com/docview/2417418335
https://inria.hal.science/hal-02364900
Volume 28
WOSCitedRecordID wos000543714200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared)
  customDbUrl:
  eissn: 2329-9304
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001079974
  issn: 2329-9290
  databaseCode: RIE
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED828UEf_BbnF0V802i2rvl4LDIZOERQh2-lTa5sIK1sq3-_l6ybiiL41pZLCbkkd79L7ncA5zpNESOeszzXBFBkR7FUoWTWZLRfihSVr58yHMj7e_Xyoh8acLnMhUFEf_kMr9yjP8u3palcqOxaO7AioiY0pRTzXK3PeAqXWnvSZfIRNCOrzxc5MlxfP8WPgwdCgx0CqZ6TJPxmh5ojdwvSl1f5sSd7Q3O7-b8ubsFG7VAG8XwGbEMDix1Y_0IzuAt3cWXHJRuOpxVJPr4hmlHQK0ZO4e53gb83ENyU7vjahwaDIUHoOkwYxNWsZL3CJb9PpnvwfNt7uumzuogCMzQOMya6ueUEn8kOoTRGZJlNu6kU3IR5m2eobCfjgiupkNBfhGlEq1jljogu66ZWh_uwUpQFHkCQdzMeZcqSC0kgxBL4EcJEoaC3tjUGW9BeDGliaoZxV-jiNfFIg-vEqyFxakhqNbTgYtnmbc6v8af0GWlqKeiosfvxIHHfPBO-5vydhHadXpZStUpacLxQbFKv02lC_ouj7wnD6PD3Vkew5jowD7ocw8psUuEJrJr32Xg6OfVT8AOY39aM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Ra9swED7SbLDuYevalWbrNjP61qlRYluWHk1JyagbCk1D34QtnUlg2CWJ-_t7UpxsY2OwN9ucjNBJuvtOuu8AzlSeI8a8ZGWpCKAkQ8lyiQmzpqD9UuQoff2UWZZMJvLhQd124NsuFwYR_eUzvHCP_izf1qZxobK-cmBFxHvwIo6iId9ka_2MqPBEKU-7TF6CYmT3-TZLhqv-NL3LbgkPDgmmelaS8DdLtDd39yB9gZU_dmVvaq7e_l8nD-BN61IG6WYOvIMOVofw-heiwSO4Thu7qNlssWpI8u4R0cyDUTV3Kne_C_zNgeCydgfYPjgYzAhEt4HCIG3WNRtVLv19uXoP91ej6eWYtWUUmKFxWDMRlZYTgCZLhIkxoihsHuWJ4CYsB7xAaYcFF1wmEgn_xZjHtI5l6ajoiii3KjyGblVXeAJBGRU8LqQlJ5JgiCX4I4SJQ0FvA2sM9mCwHVJtWo5xV-rih_ZYgyvt1aCdGnSrhh6c79o8bhg2_in9lTS1E3Tk2OM00-6b58JXnD-R0JHTy06qVUkPTreK1e1KXWnyYByBTxjGH_7e6gu8Gk9vMp19n1x_hH3XmU0I5hS662WDn-CleVovVsvPfjo-A-0Z2dM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Audio-Visual+Speech+Enhancement+Using+Conditional+Variational+Auto-Encoders&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Sadeghi%2C+Mostafa&rft.au=Leglaive%2C+Simon&rft.au=Alameda-Pineda%2C+Xavier&rft.au=Girin%2C+Laurent&rft.date=2020-01-01&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=28&rft.spage=1788&rft.epage=1800&rft_id=info:doi/10.1109%2FTASLP.2020.3000593&rft.externalDocID=9110765
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon