AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Existing efforts either focus on learnin...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE Access Ročník 12; s. 57288 - 57301
Hlavní autoři: Sun, Yasheng, Chu, Wenqing, Zhou, Hang, Wang, Kaisiyuan, Koike, Hideki
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 2024
Institute of Electrical and Electronics Engineers (IEEE)
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2169-3536, 2169-3536
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Existing efforts either focus on learning a dynamic talking head pose synchronized with speech rhythm or aim for stylized facial movements guided by external reference such as emotional labels or reference video clips. The former works often yield coarse alignment, neglecting the emotional nuances present in the audio content while the latter studies lead to unnatural applications, requiring manual style source selection by users. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation. This system harnesses the robust contextual reasoning and hallucination capability offered by Large Language Models (LLMs) to instruct the realistic synthesis of 3D talking faces. Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information and generating instructions implying expressive facial details seamlessly corresponding to the speech. Subsequently, a diffusion-based generative network executes these instructions. This two-stage process, coupled with the incorporation of LLMs, enhances model interpretability and provides users with flexibility to comprehend instructions and specify desired operations or modifications. Specifically, given a speech clip, we first employ a Q-Former for contrastive alignment the speech features with visual instructions, which is then projected to input text embedding of LLMs. It functions as a prompting strategy, prompting LLMs to generate plausible visual instructions that encompass diverse facial details. In order to use these predicted instructions, a language-guided talking face generation system with disentangled latent space is delicately derived, where the speech content related lip movements and emotion correlated facial expressions are separately represented in speech content space and content irrelevant space. Additionally, we introduce a contrastive instruction-style alignment and diffusion technique within the content-irrelevant space to fully exploit the talking prior network for diverse instruction-following synthesis. Extensive experiments showcase the effectiveness of our approach in producing vivid talking faces with expressive facial movements and consistent emotional status.
AbstractList While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Existing efforts either focus on learning a dynamic talking head pose synchronized with speech rhythm or aim for stylized facial movements guided by external reference such as emotional labels or reference video clips. The former works often yield coarse alignment, neglecting the emotional nuances present in the audio content while the latter studies lead to unnatural applications, requiring manual style source selection by users. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation. This system harnesses the robust contextual reasoning and hallucination capability offered by Large Language Models (LLMs) to instruct the realistic synthesis of 3D talking faces. Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information and generating instructions implying expressive facial details seamlessly corresponding to the speech. Subsequently, a diffusion-based generative network executes these instructions. This two-stage process, coupled with the incorporation of LLMs, enhances model interpretability and provides users with flexibility to comprehend instructions and specify desired operations or modifications. Specifically, given a speech clip, we first employ a Q-Former for contrastive alignment the speech features with visual instructions, which is then projected to input text embedding of LLMs. It functions as a prompting strategy, prompting LLMs to generate plausible visual instructions that encompass diverse facial details. In order to use these predicted instructions, a language-guided talking face generation system with disentangled latent space is delicately derived, where the speech content related lip movements and emotion correlated facial expressions are separately represented in speech content space and content irrelevant space. Additionally, we introduce a contrastive instruction-style alignment and diffusion technique within the content-irrelevant space to fully exploit the talking prior network for diverse instruction-following synthesis. Extensive experiments showcase the effectiveness of our approach in producing vivid talking faces with expressive facial movements and consistent emotional status.
Author Sun, Yasheng
Wang, Kaisiyuan
Koike, Hideki
Zhou, Hang
Chu, Wenqing
Author_xml – sequence: 1
  givenname: Yasheng
  orcidid: 0000-0002-0589-4424
  surname: Sun
  fullname: Sun, Yasheng
  email: sun.y.aj@m.titech.ac.jp
  organization: Tokyo Institute of Technology, Tokyo, Japan
– sequence: 2
  givenname: Wenqing
  orcidid: 0000-0003-0816-7975
  surname: Chu
  fullname: Chu, Wenqing
  organization: Baidu Inc., Beijing, China
– sequence: 3
  givenname: Hang
  surname: Zhou
  fullname: Zhou, Hang
  organization: Baidu Inc., Beijing, China
– sequence: 4
  givenname: Kaisiyuan
  surname: Wang
  fullname: Wang, Kaisiyuan
  organization: School of Electrical and Computer Engineering, The University of Sydney, Darlington, NSW, Australia
– sequence: 5
  givenname: Hideki
  orcidid: 0000-0002-8989-6434
  surname: Koike
  fullname: Koike, Hideki
  organization: Tokyo Institute of Technology, Tokyo, Japan
BackLink https://cir.nii.ac.jp/crid/1873399491342033664$$DView record in CiNii
BookMark eNp9UU1vEzEQtVCRKKW_AA4rwXVTe_3NLQppiRSJQ0slTtbEO1s5LOtg7yL49zjdVKo4MIfxaPzemxm91-RsiAMS8pbRBWPUXi1Xq_Xt7aKhjVhwbikzzQty3jBlay65OntWvyKXOe9pCVNaUp-Tb8v7TX0H_fcwPHystghpKFW1nNoQ6_uQJ-irzZDHNPkxxCFXXUzV-vchYc7hF1b8U3ViV9fgsbrBARMcoW_Iyw76jJen94J8vV7frT7X2y83m9VyW3spzFjrlnpsjbCGek-Ba62llLoz3hulvLE7sNB2XHuFVkqlNWjoOmVReoM7yy_IZtZtI-zdIYUfkP64CME9NmJ6cJDG4Ht0TYvctlaCsErgroRRpvPAKKi2a6FovZ-1Din-nDCPbh-nNJT1HadCWUaNoQVlZ5RPMeeEnfNhfLx5TBB6x6g7GuNmY9zRGHcypnD5P9ynjf_P-jCzhhDKsGNmRpdfKyzjoqGcKyUK7N0MC4j4TFhSwZjifwGQ0qXM
CODEN IAECCG
CitedBy_id crossref_primary_10_1007_s44267_025_00081_2
crossref_primary_10_1016_j_commtr_2025_100172
crossref_primary_10_1109_ACCESS_2025_3555297
Cites_doi 10.1109/CVPR46437.2021.01386
10.1007/s11263-019-01251-8
10.1145/3130800.3130813
10.1109/ICCV51070.2023.01885
10.5281/zenodo.1188976
10.3115/1073083.1073135
10.1609/aaai.v38i21.30570
10.1109/TAFFC.2019.2916031
10.1109/CVPR52729.2023.00639
10.1109/CVPR52688.2022.01967
10.1007/978-3-319-46454-1_24
10.1109/ICCV48922.2021.00573
10.18653/v1/2023.findings-acl.67
10.1145/3072959.3073658
10.1145/3610548.3618183
10.24963/ijcai.2021/141
10.1109/ICCV51070.2023.02024
10.1109/CVPR.2019.01034
10.1109/ICASSP49357.2023.10096441
10.1109/WACV48630.2021.00009
10.1109/ICCV51070.2023.02069
10.1145/3394171.3413532
10.1109/CVPR52729.2023.00836
10.1109/CVPR52688.2022.01821
10.1609/aaai.v37i2.25280
10.1109/CVPR52729.2023.01229
10.1109/CVPR52729.2023.01227
10.3115/1626355.1626389
10.1145/3474085.3475280
10.1109/ICCV51070.2023.00703
10.1109/ICCV51070.2023.01891
10.1109/TASLP.2021.3122291
10.1145/3550469.3555393
10.1145/3528233.3530745
10.1609/aaai.v38i17.29902
10.1007/978-3-030-01261-8_41
10.1007/978-3-030-58545-7_3
10.1007/978-3-030-58589-1_42
10.1109/CVPR46437.2021.00416
10.24963/ijcai.2022/184
10.1109/CVPR.2019.00432
10.1007/978-3-030-58517-4_42
10.1109/CVPR52688.2022.00338
10.1109/ICCV51070.2023.01912
10.1109/ICCV48922.2021.00121
10.1109/ICCV51070.2023.00925
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
ESBDL
RIA
RIE
RYH
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2024.3390182
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CiNii Complete
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Materials Research Database

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 57301
ExternalDocumentID oai_doaj_org_article_2de39d95a4964ebbbb868fca10a6dfda
10_1109_ACCESS_2024_3390182
10504116
Genre orig-research
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
RYH
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c548t-7d0ced84980cc0a37775557f8cc866c89ba9adf37c6e955677a7aff69e5c8eb93
IEDL.DBID DOA
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001208900400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2169-3536
IngestDate Fri Oct 03 12:51:12 EDT 2025
Sun Nov 30 05:12:28 EST 2025
Sat Nov 29 06:25:39 EST 2025
Tue Nov 18 22:27:33 EST 2025
Thu Jun 26 23:47:50 EDT 2025
Wed Aug 27 02:06:37 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c548t-7d0ced84980cc0a37775557f8cc866c89ba9adf37c6e955677a7aff69e5c8eb93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8989-6434
0000-0002-0589-4424
0000-0003-0816-7975
OpenAccessLink https://doaj.org/article/2de39d95a4964ebbbb868fca10a6dfda
PQID 3046910880
PQPubID 4845423
PageCount 14
ParticipantIDs crossref_citationtrail_10_1109_ACCESS_2024_3390182
proquest_journals_3046910880
nii_cinii_1873399491342033664
doaj_primary_oai_doaj_org_article_2de39d95a4964ebbbb868fca10a6dfda
crossref_primary_10_1109_ACCESS_2024_3390182
ieee_primary_10504116
PublicationCentury 2000
PublicationDate 20240000
2024-01-01
2024-00-00
20240101
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 20240000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE Access
PublicationTitleAbbrev Access
PublicationYear 2024
Publisher IEEE
Institute of Electrical and Electronics Engineers (IEEE)
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers (IEEE)
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref57
Esser (ref74) 2020
ref12
Ouyang (ref48) 2022; 35
ref56
Ramesh (ref67) 2022
ref15
Heusel (ref69); 30
ref14
ref58
Liang (ref68); 35
Lewis (ref81)
ref52
ref11
ref55
Tevet (ref40) 2022
ref17
ref16
ref19
Zhang (ref45) 2022
Van den Oord (ref18); 29
Chiang (ref72) 2023
Lam (ref41)
ref51
Wang (ref37) 2023
Mohamed (ref61) 2021
Lin (ref77)
Alayrac (ref63); 35
ref43
Ma (ref7) 2023
Baevski (ref66)
Yu (ref9)
Wu (ref8) 2023
ref3
ref6
ref5
Wei (ref42); 35
ref35
Touvron (ref46) 2023
ref79
ref34
ref78
ref36
ref31
ref75
ref30
ref33
ref32
ref76
ref1
ref39
ref38
ref71
ref70
Ma (ref4) 2023
ref24
Wei (ref44) 2022
ref23
ref26
ref25
ref20
ref22
ref21
ref65
Aneja (ref80) 2023
Wang (ref10) 2023
ref28
ref27
ref29
Zhu (ref50) 2023
Zeng (ref47)
Sun (ref53) 2023
Zheng (ref54) 2023
Schick (ref49) 2023
ref60
Sun (ref2) 2023
Li (ref62) 2023
Ho (ref59); 33
van den Oord (ref64) 2018
Paszke (ref73); 32
References_xml – ident: ref26
  doi: 10.1109/CVPR46437.2021.01386
– volume: 32
  start-page: 3
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref73
  article-title: PyTorch: An imperative style, high-performance deep learning library
– year: 2023
  ident: ref46
  article-title: LLaMA: Open and efficient foundation language models
  publication-title: arXiv:2302.13971
– year: 2023
  ident: ref50
  article-title: MiniGPT-4: Enhancing vision-language understanding with advanced large language models
  publication-title: arXiv:2304.10592
– ident: ref24
  doi: 10.1007/s11263-019-01251-8
– start-page: 6
  volume-title: Proc. The 12th Int. Conf. Learn. Represent.
  ident: ref9
  article-title: Language model beats diffusion—Tokenizer is key to visual generation
– volume: 30
  start-page: 5
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref69
  article-title: GANs trained by a two time-scale update rule converge to a local nash equilibrium
– ident: ref58
  doi: 10.1145/3130800.3130813
– year: 2023
  ident: ref37
  article-title: AgentAvatar: Disentangling planning, driving and rendering for photorealistic avatar agents
  publication-title: arXiv:2311.17465
– year: 2023
  ident: ref2
  article-title: DiffPoseTalk: Speech-driven stylistic 3D facial animation and head pose generation via diffusion models
  publication-title: arXiv:2310.00434
– ident: ref20
  doi: 10.1109/ICCV51070.2023.01885
– ident: ref57
  doi: 10.5281/zenodo.1188976
– ident: ref75
  doi: 10.3115/1073083.1073135
– volume: 35
  start-page: 24824
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref42
  article-title: Chain-of-thought prompting elicits reasoning in large language models
– ident: ref51
  doi: 10.1609/aaai.v38i21.30570
– start-page: 74
  volume-title: Proc. Text Summarization Branches Out
  ident: ref77
  article-title: Rouge: A package for automatic evaluation of summaries
– ident: ref23
  doi: 10.1109/TAFFC.2019.2916031
– ident: ref36
  doi: 10.1109/CVPR52729.2023.00639
– ident: ref56
  doi: 10.1109/CVPR52688.2022.01967
– volume-title: Vicuna: An Open-Source Chatbot Impressing GPT-4 With 90%* ChatGPT Quality
  year: 2023
  ident: ref72
– start-page: 5
  volume-title: Proc. 34th Int. Conf. Neural Inf. Process. Syst.
  ident: ref81
  article-title: Retrieval-augmented generation for knowledge-intensive NLP tasks
– ident: ref79
  doi: 10.1007/978-3-319-46454-1_24
– year: 2023
  ident: ref4
  article-title: TalkCLIP: Talking head generation with text-guided expressive speaking styles
  publication-title: arXiv:2304.00334
– ident: ref38
  doi: 10.1109/ICCV48922.2021.00573
– ident: ref43
  doi: 10.18653/v1/2023.findings-acl.67
– volume: 35
  start-page: 17612
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref68
  article-title: Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning
– volume: 29
  start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref18
  article-title: Conditional image generation with pixelCNN decoders
– ident: ref16
  doi: 10.1145/3072959.3073658
– ident: ref31
  doi: 10.1145/3610548.3618183
– ident: ref13
  doi: 10.24963/ijcai.2021/141
– volume-title: Proc. Int. Conf. Learn. Represent.
  ident: ref41
  article-title: BDDM: Bilateral denoising diffusion models for fast and high-quality speech synthesis
– year: 2023
  ident: ref53
  article-title: ImageBrush: Learning visual in-context instructions for exemplar-based image manipulation
  publication-title: arXiv:2308.00906
– year: 2021
  ident: ref61
  article-title: Arabic speech emotion recognition employing Wav2vec2.0 and Hubert based on BAVED dataset
  publication-title: arXiv:2110.04425
– year: 2023
  ident: ref10
  article-title: A survey on large language model based autonomous agents
  publication-title: arXiv:2308.11432
– ident: ref34
  doi: 10.1109/ICCV51070.2023.02024
– ident: ref39
  doi: 10.1109/CVPR.2019.01034
– start-page: 12449
  volume-title: Proc. Int. Conf. Neural Inf. Process. Syst. (NIPS)
  ident: ref66
  article-title: Wav2vec 2.0: A framework for self-supervised learning of speech representations
– year: 2023
  ident: ref54
  article-title: MiniGPT-5: Interleaved vision-and-language generation via generative vokens
  publication-title: arXiv:2310.02239
– start-page: 1
  volume-title: Proc. 11th Int. Conf. Learn. Represent.
  ident: ref47
  article-title: GLM-130b: An open bilingual pre-trained model
– ident: ref70
  doi: 10.1109/ICASSP49357.2023.10096441
– ident: ref17
  doi: 10.1109/WACV48630.2021.00009
– ident: ref32
  doi: 10.1109/ICCV51070.2023.02069
– volume: 33
  start-page: 6840
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref59
  article-title: Denoising diffusion probabilistic models
– year: 2023
  ident: ref49
  article-title: Toolformer: Language models can teach themselves to use tools
  publication-title: arXiv:2302.04761
– ident: ref71
  doi: 10.1145/3394171.3413532
– ident: ref30
  doi: 10.1109/CVPR52729.2023.00836
– ident: ref19
  doi: 10.1109/CVPR52688.2022.01821
– ident: ref3
  doi: 10.1609/aaai.v37i2.25280
– ident: ref21
  doi: 10.1109/CVPR52729.2023.01229
– year: 2022
  ident: ref40
  article-title: Human motion diffusion model
  publication-title: arXiv:2209.14916
– year: 2022
  ident: ref67
  article-title: Hierarchical text-conditional image generation with CLIP latents
  publication-title: arXiv:2204.06125
– ident: ref22
  doi: 10.1109/CVPR52729.2023.01227
– ident: ref76
  doi: 10.3115/1626355.1626389
– ident: ref25
  doi: 10.1145/3474085.3475280
– volume: 35
  start-page: 27730
  volume-title: Advances in Neural Information Processing Systems
  year: 2022
  ident: ref48
  article-title: Training language models to follow instructions with human feedback
– ident: ref6
  doi: 10.1109/ICCV51070.2023.00703
– ident: ref5
  doi: 10.1109/ICCV51070.2023.01891
– year: 2023
  ident: ref7
  article-title: DreamTalk: When expressive talking head generation meets diffusion probabilistic models
  publication-title: arXiv:2312.09767
– ident: ref60
  doi: 10.1109/TASLP.2021.3122291
– ident: ref14
  doi: 10.1145/3550469.3555393
– ident: ref29
  doi: 10.1145/3528233.3530745
– ident: ref52
  doi: 10.1609/aaai.v38i17.29902
– year: 2023
  ident: ref80
  article-title: FaceTalk: Audio-driven motion diffusion for neural parametric head models
  publication-title: arXiv:2312.08459
– ident: ref12
  doi: 10.1007/978-3-030-01261-8_41
– year: 2022
  ident: ref44
  article-title: Emergent abilities of large language models
  publication-title: arXiv:2206.07682
– ident: ref1
  doi: 10.1007/978-3-030-58545-7_3
– ident: ref35
  doi: 10.1007/978-3-030-58589-1_42
– year: 2020
  ident: ref74
  article-title: Taming transformers for high-resolution image synthesis
  publication-title: arXiv:2012.09841
– year: 2023
  ident: ref8
  article-title: NExT-GPT: Any-to-any multimodal LLM
  publication-title: arXiv:2309.05519
– ident: ref15
  doi: 10.1109/CVPR46437.2021.00416
– year: 2022
  ident: ref45
  article-title: OPT: Open pre-trained transformer language models
  publication-title: arXiv:2205.01068
– year: 2023
  ident: ref62
  article-title: BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
  publication-title: arXiv:2301.12597
– volume: 35
  start-page: 23716
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref63
  article-title: Flamingo: A visual language model for few-shot learning
– ident: ref27
  doi: 10.24963/ijcai.2022/184
– ident: ref78
  doi: 10.1109/CVPR.2019.00432
– ident: ref11
  doi: 10.1007/978-3-030-58517-4_42
– ident: ref28
  doi: 10.1109/CVPR52688.2022.00338
– ident: ref33
  doi: 10.1109/ICCV51070.2023.01912
– ident: ref65
  doi: 10.1109/ICCV48922.2021.00121
– ident: ref55
  doi: 10.1109/ICCV51070.2023.00925
– year: 2018
  ident: ref64
  article-title: Representation learning with contrastive predictive coding
  publication-title: arXiv:1807.03748
SSID ssj0000816957
Score 2.350932
Snippet While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating...
SourceID doaj
proquest
crossref
nii
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 57288
SubjectTerms Alignment
Audio data
audio-visual instruction
Computer Science - Computer Vision and Pattern Recognition
contrastive learning
diffusion model
Diffusion processes
Electrical engineering. Electronics. Nuclear engineering
expressive talking face generation
Harnesses
Large language models
Speaking
Speech
Speech processing
Synchronism
Synchronization
Synthesis
Talking
Task analysis
Three-dimensional displays
TK1-9971
Visualization
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB7RigMceBYRaJEPHElx4je3ZemKXioOpSony_EDRap2Edut-vM7dtylCIFEDlEU-f3Znhk_vgF4y1wWOoNvoxehzTp5O6TOoJWSaIraZMKR4mxCnZzo83PzpV5WL3dhYozl8Fk8zJ9lLz-s_CYvleEIF5R3ndyBHaXUdFlru6CSPUgYoSqzUEfN-9l8jpVAG7Dnhyzb9rr_TfoUkv7qVQVFy3Ic_5iQi5RZPP7P8j2BR1WdJLMJ_6dwLy6fwcM7JIPP4dvs7Lg9dRd5TfwDqXyq38lsE8ZVezauNxj_-BeP7JqgGkuOrqcDsleRsE-kxiYL5yOZiKpz0D34ujg6nX9uq0OF1qNhctmqQH0MmhtNvaeOYSMKIVTS3mspvTaDMy4kpryMRgiplFMuJWmi8DoOhr2A3eVqGV8C6UJQqCw5DMw5pmO44U47TQNHJYKrBvrbhra-so1npxcXtlgd1NgJHZvRsRWdBt5tI_2YyDb-HfxjRnAbNDNllx8Iiq0Dz_YhMhOMcNxIHgd8tNTJu446GVJwDexlIO_kN2HYwAF2CCx7fndaYZ5Yx47xnjImJW9g_7ar2Drs1zZvM6P-hXPiq78k-xoe5CpMizj7sIvgxgO4768ux_XPN6VH3wACY--M
  priority: 102
  providerName: IEEE
Title AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
URI https://ieeexplore.ieee.org/document/10504116
https://cir.nii.ac.jp/crid/1873399491342033664
https://www.proquest.com/docview/3046910880
https://doaj.org/article/2de39d95a4964ebbbb868fca10a6dfda
Volume 12
WOSCitedRecordID wos001208900400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwELYQ6oEeqrZQNbzkQ49NseM3t2W7q3Io4gCInizHjyoSWhALqKf-9o4dQxdVai_NwYdoHNszE_sbx_kGoQ_M5UWn9230IrQZk7d9ogailERS1CYTjpRkE-rkRF9emtOVVF_5TNhIDzwq7qALkZlghONG8tjDpaVO3lHiZEihQCNAPSvBVJmDNZVGqEozRIk5mEynMCIICDv-ieVAX3fPlqLC2F9TrMA6sxiGP2bnsuTMX6NXFSviydjHN2gtLt6ilysMgpvo2-TiuD1zV3nD-xBXstTveHIfhuv2YljeQ_3j3ySxSwwYFc9-jKdfHyJmn3GtjefORzyyUGfRLXQ-n51Nv7Q1W0LrIeq4a1UgPgbNjSbeE8eUUkIIlbT3WkqvTe-MC4kpL6MRQirllEtJmii8jr1h79D64noR3yNMQ1CAhBwIcw7PMdxwp50mgQNC4KpB3aPirK9U4jmjxZUtIQUxdtS2zdq2VdsN-vhU6WZk0vi7-FG2yJNopsEuN8A5bHUO-y_naNBWtudKe4JwSmWD9sDA0PdcUq2gTRgjZbwjjEnJG7T7aHpb3-mlzd-QAVzBhLf9P_q2gzbyeMftnF20Dp4Q99AL_3A3LG_3iztD-fXnbL_8lPgLTDX2Ng
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5BQQIOPIsItOADR1Kc-M1tu-2qK8qKw1KVk-XYThWp2kVst-LnM07cpVUFEjlEUTSO7Xxje8aPbwDeM5cGncaX0YtQJpu8bNrKoJfS0jZqkwhH-mATajbTp6fmaz6s3p-FiTH2m8_iXnrs1_LD0q_TVBm2cEF5Vcm7cE9wXlfDca3NlEqKIWGEytxCFTUfR-MxVgO9wJrvseTd6_rG-NPT9Oe4Kji4LLruVpfcjzOTJ_9ZwqfwOBuUZDRowDO4ExfP4dE1msEX8H10Mi3n7jzNin8imVH1jIzWoVuWJ91qjemnf5hkVwQNWXL4a9giexkJOyA5NZk4H8lAVZ1Et-Hb5HA-PipzSIXSo2tyUapAfQyaG029p44ppYQQqtXeaym9No0zLrRMeRmNEFIpp1zbShOF17Ex7CVsLZaL-ApIFYJCc8mhMOf4HcMNd9ppGhAZwVUB9dWPtj7zjaewF-e29zuosQM6NqFjMzoFfNgk-jHQbfxbfD8huBFNXNn9CwTF5qZn6xCZCUY4biSPDV5a6ta7ijoZ2uAK2E5AXstvwLCAXVQILHu6V1phnljHivGaMiYlL2DnSlVsbvgrmxaa0QLDXvH1Xz77Dh4czb8c2-Pp7PMbeJiqM0zp7MAWAh134b6_vOhWP9_22v0bnkfy0w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=AVI-Talking%3A+Learning+Audio-Visual+Instructions+for+Expressive+3D+Talking+Face+Generation&rft.jtitle=IEEE+access&rft.au=Sun%2C+Yasheng&rft.au=Chu%2C+Wenqing&rft.au=Zhou%2C+Hang&rft.au=Wang%2C+Kaisiyuan&rft.date=2024&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=12&rft.spage=57288&rft.epage=57301&rft_id=info:doi/10.1109%2FACCESS.2024.3390182&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2024_3390182
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon