AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Existing efforts either focus on learnin...
Uloženo v:
| Vydáno v: | IEEE Access Ročník 12; s. 57288 - 57301 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
2024
Institute of Electrical and Electronics Engineers (IEEE) The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2169-3536, 2169-3536 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Existing efforts either focus on learning a dynamic talking head pose synchronized with speech rhythm or aim for stylized facial movements guided by external reference such as emotional labels or reference video clips. The former works often yield coarse alignment, neglecting the emotional nuances present in the audio content while the latter studies lead to unnatural applications, requiring manual style source selection by users. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation. This system harnesses the robust contextual reasoning and hallucination capability offered by Large Language Models (LLMs) to instruct the realistic synthesis of 3D talking faces. Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information and generating instructions implying expressive facial details seamlessly corresponding to the speech. Subsequently, a diffusion-based generative network executes these instructions. This two-stage process, coupled with the incorporation of LLMs, enhances model interpretability and provides users with flexibility to comprehend instructions and specify desired operations or modifications. Specifically, given a speech clip, we first employ a Q-Former for contrastive alignment the speech features with visual instructions, which is then projected to input text embedding of LLMs. It functions as a prompting strategy, prompting LLMs to generate plausible visual instructions that encompass diverse facial details. In order to use these predicted instructions, a language-guided talking face generation system with disentangled latent space is delicately derived, where the speech content related lip movements and emotion correlated facial expressions are separately represented in speech content space and content irrelevant space. Additionally, we introduce a contrastive instruction-style alignment and diffusion technique within the content-irrelevant space to fully exploit the talking prior network for diverse instruction-following synthesis. Extensive experiments showcase the effectiveness of our approach in producing vivid talking faces with expressive facial movements and consistent emotional status. |
|---|---|
| AbstractList | While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Existing efforts either focus on learning a dynamic talking head pose synchronized with speech rhythm or aim for stylized facial movements guided by external reference such as emotional labels or reference video clips. The former works often yield coarse alignment, neglecting the emotional nuances present in the audio content while the latter studies lead to unnatural applications, requiring manual style source selection by users. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation. This system harnesses the robust contextual reasoning and hallucination capability offered by Large Language Models (LLMs) to instruct the realistic synthesis of 3D talking faces. Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information and generating instructions implying expressive facial details seamlessly corresponding to the speech. Subsequently, a diffusion-based generative network executes these instructions. This two-stage process, coupled with the incorporation of LLMs, enhances model interpretability and provides users with flexibility to comprehend instructions and specify desired operations or modifications. Specifically, given a speech clip, we first employ a Q-Former for contrastive alignment the speech features with visual instructions, which is then projected to input text embedding of LLMs. It functions as a prompting strategy, prompting LLMs to generate plausible visual instructions that encompass diverse facial details. In order to use these predicted instructions, a language-guided talking face generation system with disentangled latent space is delicately derived, where the speech content related lip movements and emotion correlated facial expressions are separately represented in speech content space and content irrelevant space. Additionally, we introduce a contrastive instruction-style alignment and diffusion technique within the content-irrelevant space to fully exploit the talking prior network for diverse instruction-following synthesis. Extensive experiments showcase the effectiveness of our approach in producing vivid talking faces with expressive facial movements and consistent emotional status. |
| Author | Sun, Yasheng Wang, Kaisiyuan Koike, Hideki Zhou, Hang Chu, Wenqing |
| Author_xml | – sequence: 1 givenname: Yasheng orcidid: 0000-0002-0589-4424 surname: Sun fullname: Sun, Yasheng email: sun.y.aj@m.titech.ac.jp organization: Tokyo Institute of Technology, Tokyo, Japan – sequence: 2 givenname: Wenqing orcidid: 0000-0003-0816-7975 surname: Chu fullname: Chu, Wenqing organization: Baidu Inc., Beijing, China – sequence: 3 givenname: Hang surname: Zhou fullname: Zhou, Hang organization: Baidu Inc., Beijing, China – sequence: 4 givenname: Kaisiyuan surname: Wang fullname: Wang, Kaisiyuan organization: School of Electrical and Computer Engineering, The University of Sydney, Darlington, NSW, Australia – sequence: 5 givenname: Hideki orcidid: 0000-0002-8989-6434 surname: Koike fullname: Koike, Hideki organization: Tokyo Institute of Technology, Tokyo, Japan |
| BackLink | https://cir.nii.ac.jp/crid/1873399491342033664$$DView record in CiNii |
| BookMark | eNp9UU1vEzEQtVCRKKW_AA4rwXVTe_3NLQppiRSJQ0slTtbEO1s5LOtg7yL49zjdVKo4MIfxaPzemxm91-RsiAMS8pbRBWPUXi1Xq_Xt7aKhjVhwbikzzQty3jBlay65OntWvyKXOe9pCVNaUp-Tb8v7TX0H_fcwPHystghpKFW1nNoQ6_uQJ-irzZDHNPkxxCFXXUzV-vchYc7hF1b8U3ViV9fgsbrBARMcoW_Iyw76jJen94J8vV7frT7X2y83m9VyW3spzFjrlnpsjbCGek-Ba62llLoz3hulvLE7sNB2XHuFVkqlNWjoOmVReoM7yy_IZtZtI-zdIYUfkP64CME9NmJ6cJDG4Ht0TYvctlaCsErgroRRpvPAKKi2a6FovZ-1Din-nDCPbh-nNJT1HadCWUaNoQVlZ5RPMeeEnfNhfLx5TBB6x6g7GuNmY9zRGHcypnD5P9ynjf_P-jCzhhDKsGNmRpdfKyzjoqGcKyUK7N0MC4j4TFhSwZjifwGQ0qXM |
| CODEN | IAECCG |
| CitedBy_id | crossref_primary_10_1007_s44267_025_00081_2 crossref_primary_10_1016_j_commtr_2025_100172 crossref_primary_10_1109_ACCESS_2025_3555297 |
| Cites_doi | 10.1109/CVPR46437.2021.01386 10.1007/s11263-019-01251-8 10.1145/3130800.3130813 10.1109/ICCV51070.2023.01885 10.5281/zenodo.1188976 10.3115/1073083.1073135 10.1609/aaai.v38i21.30570 10.1109/TAFFC.2019.2916031 10.1109/CVPR52729.2023.00639 10.1109/CVPR52688.2022.01967 10.1007/978-3-319-46454-1_24 10.1109/ICCV48922.2021.00573 10.18653/v1/2023.findings-acl.67 10.1145/3072959.3073658 10.1145/3610548.3618183 10.24963/ijcai.2021/141 10.1109/ICCV51070.2023.02024 10.1109/CVPR.2019.01034 10.1109/ICASSP49357.2023.10096441 10.1109/WACV48630.2021.00009 10.1109/ICCV51070.2023.02069 10.1145/3394171.3413532 10.1109/CVPR52729.2023.00836 10.1109/CVPR52688.2022.01821 10.1609/aaai.v37i2.25280 10.1109/CVPR52729.2023.01229 10.1109/CVPR52729.2023.01227 10.3115/1626355.1626389 10.1145/3474085.3475280 10.1109/ICCV51070.2023.00703 10.1109/ICCV51070.2023.01891 10.1109/TASLP.2021.3122291 10.1145/3550469.3555393 10.1145/3528233.3530745 10.1609/aaai.v38i17.29902 10.1007/978-3-030-01261-8_41 10.1007/978-3-030-58545-7_3 10.1007/978-3-030-58589-1_42 10.1109/CVPR46437.2021.00416 10.24963/ijcai.2022/184 10.1109/CVPR.2019.00432 10.1007/978-3-030-58517-4_42 10.1109/CVPR52688.2022.00338 10.1109/ICCV51070.2023.01912 10.1109/ICCV48922.2021.00121 10.1109/ICCV51070.2023.00925 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| DBID | 97E ESBDL RIA RIE RYH AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
| DOI | 10.1109/ACCESS.2024.3390182 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CiNii Complete CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2169-3536 |
| EndPage | 57301 |
| ExternalDocumentID | oai_doaj_org_article_2de39d95a4964ebbbb868fca10a6dfda 10_1109_ACCESS_2024_3390182 10504116 |
| Genre | orig-research |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS RYH AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c548t-7d0ced84980cc0a37775557f8cc866c89ba9adf37c6e955677a7aff69e5c8eb93 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001208900400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2169-3536 |
| IngestDate | Fri Oct 03 12:51:12 EDT 2025 Sun Nov 30 05:12:28 EST 2025 Sat Nov 29 06:25:39 EST 2025 Tue Nov 18 22:27:33 EST 2025 Thu Jun 26 23:47:50 EDT 2025 Wed Aug 27 02:06:37 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c548t-7d0ced84980cc0a37775557f8cc866c89ba9adf37c6e955677a7aff69e5c8eb93 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-8989-6434 0000-0002-0589-4424 0000-0003-0816-7975 |
| OpenAccessLink | https://doaj.org/article/2de39d95a4964ebbbb868fca10a6dfda |
| PQID | 3046910880 |
| PQPubID | 4845423 |
| PageCount | 14 |
| ParticipantIDs | crossref_citationtrail_10_1109_ACCESS_2024_3390182 proquest_journals_3046910880 nii_cinii_1873399491342033664 doaj_primary_oai_doaj_org_article_2de39d95a4964ebbbb868fca10a6dfda crossref_primary_10_1109_ACCESS_2024_3390182 ieee_primary_10504116 |
| PublicationCentury | 2000 |
| PublicationDate | 20240000 2024-01-01 2024-00-00 20240101 |
| PublicationDateYYYYMMDD | 2024-01-01 |
| PublicationDate_xml | – year: 2024 text: 20240000 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE Access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2024 |
| Publisher | IEEE Institute of Electrical and Electronics Engineers (IEEE) The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers (IEEE) – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref57 Esser (ref74) 2020 ref12 Ouyang (ref48) 2022; 35 ref56 Ramesh (ref67) 2022 ref15 Heusel (ref69); 30 ref14 ref58 Liang (ref68); 35 Lewis (ref81) ref52 ref11 ref55 Tevet (ref40) 2022 ref17 ref16 ref19 Zhang (ref45) 2022 Van den Oord (ref18); 29 Chiang (ref72) 2023 Lam (ref41) ref51 Wang (ref37) 2023 Mohamed (ref61) 2021 Lin (ref77) Alayrac (ref63); 35 ref43 Ma (ref7) 2023 Baevski (ref66) Yu (ref9) Wu (ref8) 2023 ref3 ref6 ref5 Wei (ref42); 35 ref35 Touvron (ref46) 2023 ref79 ref34 ref78 ref36 ref31 ref75 ref30 ref33 ref32 ref76 ref1 ref39 ref38 ref71 ref70 Ma (ref4) 2023 ref24 Wei (ref44) 2022 ref23 ref26 ref25 ref20 ref22 ref21 ref65 Aneja (ref80) 2023 Wang (ref10) 2023 ref28 ref27 ref29 Zhu (ref50) 2023 Zeng (ref47) Sun (ref53) 2023 Zheng (ref54) 2023 Schick (ref49) 2023 ref60 Sun (ref2) 2023 Li (ref62) 2023 Ho (ref59); 33 van den Oord (ref64) 2018 Paszke (ref73); 32 |
| References_xml | – ident: ref26 doi: 10.1109/CVPR46437.2021.01386 – volume: 32 start-page: 3 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref73 article-title: PyTorch: An imperative style, high-performance deep learning library – year: 2023 ident: ref46 article-title: LLaMA: Open and efficient foundation language models publication-title: arXiv:2302.13971 – year: 2023 ident: ref50 article-title: MiniGPT-4: Enhancing vision-language understanding with advanced large language models publication-title: arXiv:2304.10592 – ident: ref24 doi: 10.1007/s11263-019-01251-8 – start-page: 6 volume-title: Proc. The 12th Int. Conf. Learn. Represent. ident: ref9 article-title: Language model beats diffusion—Tokenizer is key to visual generation – volume: 30 start-page: 5 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref69 article-title: GANs trained by a two time-scale update rule converge to a local nash equilibrium – ident: ref58 doi: 10.1145/3130800.3130813 – year: 2023 ident: ref37 article-title: AgentAvatar: Disentangling planning, driving and rendering for photorealistic avatar agents publication-title: arXiv:2311.17465 – year: 2023 ident: ref2 article-title: DiffPoseTalk: Speech-driven stylistic 3D facial animation and head pose generation via diffusion models publication-title: arXiv:2310.00434 – ident: ref20 doi: 10.1109/ICCV51070.2023.01885 – ident: ref57 doi: 10.5281/zenodo.1188976 – ident: ref75 doi: 10.3115/1073083.1073135 – volume: 35 start-page: 24824 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref42 article-title: Chain-of-thought prompting elicits reasoning in large language models – ident: ref51 doi: 10.1609/aaai.v38i21.30570 – start-page: 74 volume-title: Proc. Text Summarization Branches Out ident: ref77 article-title: Rouge: A package for automatic evaluation of summaries – ident: ref23 doi: 10.1109/TAFFC.2019.2916031 – ident: ref36 doi: 10.1109/CVPR52729.2023.00639 – ident: ref56 doi: 10.1109/CVPR52688.2022.01967 – volume-title: Vicuna: An Open-Source Chatbot Impressing GPT-4 With 90%* ChatGPT Quality year: 2023 ident: ref72 – start-page: 5 volume-title: Proc. 34th Int. Conf. Neural Inf. Process. Syst. ident: ref81 article-title: Retrieval-augmented generation for knowledge-intensive NLP tasks – ident: ref79 doi: 10.1007/978-3-319-46454-1_24 – year: 2023 ident: ref4 article-title: TalkCLIP: Talking head generation with text-guided expressive speaking styles publication-title: arXiv:2304.00334 – ident: ref38 doi: 10.1109/ICCV48922.2021.00573 – ident: ref43 doi: 10.18653/v1/2023.findings-acl.67 – volume: 35 start-page: 17612 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref68 article-title: Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning – volume: 29 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref18 article-title: Conditional image generation with pixelCNN decoders – ident: ref16 doi: 10.1145/3072959.3073658 – ident: ref31 doi: 10.1145/3610548.3618183 – ident: ref13 doi: 10.24963/ijcai.2021/141 – volume-title: Proc. Int. Conf. Learn. Represent. ident: ref41 article-title: BDDM: Bilateral denoising diffusion models for fast and high-quality speech synthesis – year: 2023 ident: ref53 article-title: ImageBrush: Learning visual in-context instructions for exemplar-based image manipulation publication-title: arXiv:2308.00906 – year: 2021 ident: ref61 article-title: Arabic speech emotion recognition employing Wav2vec2.0 and Hubert based on BAVED dataset publication-title: arXiv:2110.04425 – year: 2023 ident: ref10 article-title: A survey on large language model based autonomous agents publication-title: arXiv:2308.11432 – ident: ref34 doi: 10.1109/ICCV51070.2023.02024 – ident: ref39 doi: 10.1109/CVPR.2019.01034 – start-page: 12449 volume-title: Proc. Int. Conf. Neural Inf. Process. Syst. (NIPS) ident: ref66 article-title: Wav2vec 2.0: A framework for self-supervised learning of speech representations – year: 2023 ident: ref54 article-title: MiniGPT-5: Interleaved vision-and-language generation via generative vokens publication-title: arXiv:2310.02239 – start-page: 1 volume-title: Proc. 11th Int. Conf. Learn. Represent. ident: ref47 article-title: GLM-130b: An open bilingual pre-trained model – ident: ref70 doi: 10.1109/ICASSP49357.2023.10096441 – ident: ref17 doi: 10.1109/WACV48630.2021.00009 – ident: ref32 doi: 10.1109/ICCV51070.2023.02069 – volume: 33 start-page: 6840 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref59 article-title: Denoising diffusion probabilistic models – year: 2023 ident: ref49 article-title: Toolformer: Language models can teach themselves to use tools publication-title: arXiv:2302.04761 – ident: ref71 doi: 10.1145/3394171.3413532 – ident: ref30 doi: 10.1109/CVPR52729.2023.00836 – ident: ref19 doi: 10.1109/CVPR52688.2022.01821 – ident: ref3 doi: 10.1609/aaai.v37i2.25280 – ident: ref21 doi: 10.1109/CVPR52729.2023.01229 – year: 2022 ident: ref40 article-title: Human motion diffusion model publication-title: arXiv:2209.14916 – year: 2022 ident: ref67 article-title: Hierarchical text-conditional image generation with CLIP latents publication-title: arXiv:2204.06125 – ident: ref22 doi: 10.1109/CVPR52729.2023.01227 – ident: ref76 doi: 10.3115/1626355.1626389 – ident: ref25 doi: 10.1145/3474085.3475280 – volume: 35 start-page: 27730 volume-title: Advances in Neural Information Processing Systems year: 2022 ident: ref48 article-title: Training language models to follow instructions with human feedback – ident: ref6 doi: 10.1109/ICCV51070.2023.00703 – ident: ref5 doi: 10.1109/ICCV51070.2023.01891 – year: 2023 ident: ref7 article-title: DreamTalk: When expressive talking head generation meets diffusion probabilistic models publication-title: arXiv:2312.09767 – ident: ref60 doi: 10.1109/TASLP.2021.3122291 – ident: ref14 doi: 10.1145/3550469.3555393 – ident: ref29 doi: 10.1145/3528233.3530745 – ident: ref52 doi: 10.1609/aaai.v38i17.29902 – year: 2023 ident: ref80 article-title: FaceTalk: Audio-driven motion diffusion for neural parametric head models publication-title: arXiv:2312.08459 – ident: ref12 doi: 10.1007/978-3-030-01261-8_41 – year: 2022 ident: ref44 article-title: Emergent abilities of large language models publication-title: arXiv:2206.07682 – ident: ref1 doi: 10.1007/978-3-030-58545-7_3 – ident: ref35 doi: 10.1007/978-3-030-58589-1_42 – year: 2020 ident: ref74 article-title: Taming transformers for high-resolution image synthesis publication-title: arXiv:2012.09841 – year: 2023 ident: ref8 article-title: NExT-GPT: Any-to-any multimodal LLM publication-title: arXiv:2309.05519 – ident: ref15 doi: 10.1109/CVPR46437.2021.00416 – year: 2022 ident: ref45 article-title: OPT: Open pre-trained transformer language models publication-title: arXiv:2205.01068 – year: 2023 ident: ref62 article-title: BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models publication-title: arXiv:2301.12597 – volume: 35 start-page: 23716 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref63 article-title: Flamingo: A visual language model for few-shot learning – ident: ref27 doi: 10.24963/ijcai.2022/184 – ident: ref78 doi: 10.1109/CVPR.2019.00432 – ident: ref11 doi: 10.1007/978-3-030-58517-4_42 – ident: ref28 doi: 10.1109/CVPR52688.2022.00338 – ident: ref33 doi: 10.1109/ICCV51070.2023.01912 – ident: ref65 doi: 10.1109/ICCV48922.2021.00121 – ident: ref55 doi: 10.1109/ICCV51070.2023.00925 – year: 2018 ident: ref64 article-title: Representation learning with contrastive predictive coding publication-title: arXiv:1807.03748 |
| SSID | ssj0000816957 |
| Score | 2.350932 |
| Snippet | While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating... |
| SourceID | doaj proquest crossref nii ieee |
| SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 57288 |
| SubjectTerms | Alignment Audio data audio-visual instruction Computer Science - Computer Vision and Pattern Recognition contrastive learning diffusion model Diffusion processes Electrical engineering. Electronics. Nuclear engineering expressive talking face generation Harnesses Large language models Speaking Speech Speech processing Synchronism Synchronization Synthesis Talking Task analysis Three-dimensional displays TK1-9971 Visualization |
| SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB7RigMceBYRaJEPHElx4je3ZemKXioOpSony_EDRap2Edut-vM7dtylCIFEDlEU-f3Znhk_vgF4y1wWOoNvoxehzTp5O6TOoJWSaIraZMKR4mxCnZzo83PzpV5WL3dhYozl8Fk8zJ9lLz-s_CYvleEIF5R3ndyBHaXUdFlru6CSPUgYoSqzUEfN-9l8jpVAG7Dnhyzb9rr_TfoUkv7qVQVFy3Ic_5iQi5RZPP7P8j2BR1WdJLMJ_6dwLy6fwcM7JIPP4dvs7Lg9dRd5TfwDqXyq38lsE8ZVezauNxj_-BeP7JqgGkuOrqcDsleRsE-kxiYL5yOZiKpz0D34ujg6nX9uq0OF1qNhctmqQH0MmhtNvaeOYSMKIVTS3mspvTaDMy4kpryMRgiplFMuJWmi8DoOhr2A3eVqGV8C6UJQqCw5DMw5pmO44U47TQNHJYKrBvrbhra-so1npxcXtlgd1NgJHZvRsRWdBt5tI_2YyDb-HfxjRnAbNDNllx8Iiq0Dz_YhMhOMcNxIHgd8tNTJu446GVJwDexlIO_kN2HYwAF2CCx7fndaYZ5Yx47xnjImJW9g_7ar2Drs1zZvM6P-hXPiq78k-xoe5CpMizj7sIvgxgO4768ux_XPN6VH3wACY--M priority: 102 providerName: IEEE |
| Title | AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation |
| URI | https://ieeexplore.ieee.org/document/10504116 https://cir.nii.ac.jp/crid/1873399491342033664 https://www.proquest.com/docview/3046910880 https://doaj.org/article/2de39d95a4964ebbbb868fca10a6dfda |
| Volume | 12 |
| WOSCitedRecordID | wos001208900400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwELYQ6oEeqrZQNbzkQ49NseM3t2W7q3Io4gCInizHjyoSWhALqKf-9o4dQxdVai_NwYdoHNszE_sbx_kGoQ_M5UWn9230IrQZk7d9ogailERS1CYTjpRkE-rkRF9emtOVVF_5TNhIDzwq7qALkZlghONG8tjDpaVO3lHiZEihQCNAPSvBVJmDNZVGqEozRIk5mEynMCIICDv-ieVAX3fPlqLC2F9TrMA6sxiGP2bnsuTMX6NXFSviydjHN2gtLt6ilysMgpvo2-TiuD1zV3nD-xBXstTveHIfhuv2YljeQ_3j3ySxSwwYFc9-jKdfHyJmn3GtjefORzyyUGfRLXQ-n51Nv7Q1W0LrIeq4a1UgPgbNjSbeE8eUUkIIlbT3WkqvTe-MC4kpL6MRQirllEtJmii8jr1h79D64noR3yNMQ1CAhBwIcw7PMdxwp50mgQNC4KpB3aPirK9U4jmjxZUtIQUxdtS2zdq2VdsN-vhU6WZk0vi7-FG2yJNopsEuN8A5bHUO-y_naNBWtudKe4JwSmWD9sDA0PdcUq2gTRgjZbwjjEnJG7T7aHpb3-mlzd-QAVzBhLf9P_q2gzbyeMftnF20Dp4Q99AL_3A3LG_3iztD-fXnbL_8lPgLTDX2Ng |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5BQQIOPIsItOADR1Kc-M1tu-2qK8qKw1KVk-XYThWp2kVst-LnM07cpVUFEjlEUTSO7Xxje8aPbwDeM5cGncaX0YtQJpu8bNrKoJfS0jZqkwhH-mATajbTp6fmaz6s3p-FiTH2m8_iXnrs1_LD0q_TVBm2cEF5Vcm7cE9wXlfDca3NlEqKIWGEytxCFTUfR-MxVgO9wJrvseTd6_rG-NPT9Oe4Kji4LLruVpfcjzOTJ_9ZwqfwOBuUZDRowDO4ExfP4dE1msEX8H10Mi3n7jzNin8imVH1jIzWoVuWJ91qjemnf5hkVwQNWXL4a9giexkJOyA5NZk4H8lAVZ1Et-Hb5HA-PipzSIXSo2tyUapAfQyaG029p44ppYQQqtXeaym9No0zLrRMeRmNEFIpp1zbShOF17Ex7CVsLZaL-ApIFYJCc8mhMOf4HcMNd9ppGhAZwVUB9dWPtj7zjaewF-e29zuosQM6NqFjMzoFfNgk-jHQbfxbfD8huBFNXNn9CwTF5qZn6xCZCUY4biSPDV5a6ta7ijoZ2uAK2E5AXstvwLCAXVQILHu6V1phnljHivGaMiYlL2DnSlVsbvgrmxaa0QLDXvH1Xz77Dh4czb8c2-Pp7PMbeJiqM0zp7MAWAh134b6_vOhWP9_22v0bnkfy0w |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=AVI-Talking%3A+Learning+Audio-Visual+Instructions+for+Expressive+3D+Talking+Face+Generation&rft.jtitle=IEEE+access&rft.au=Sun%2C+Yasheng&rft.au=Chu%2C+Wenqing&rft.au=Zhou%2C+Hang&rft.au=Wang%2C+Kaisiyuan&rft.date=2024&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=12&rft.spage=57288&rft.epage=57301&rft_id=info:doi/10.1109%2FACCESS.2024.3390182&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2024_3390182 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |