Lip-Synchronized 3D Facial Animation Using Audio-Driven Graph Convolutional Autoencoder

The majority of state-of-the-art audio-driven facial animation methods implement a differentiable rendering phase within their models, and as such, their output is a 2D raster image. However, existing development pipelines for MR (Mixed Reality) applications utilize platform-specific render engines...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ... IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems (Online) Jg. 1; S. 346 - 351
Hauptverfasser:	Bozhilov, Ivaylo, Tonchev, Krasimir, Neshov, Nikolay, Manolova, Agata
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 07.09.2023
Schlagworte:	Deformation Facial animation Feature extraction Graph convolutional neural networks Holographic communication Mixed Reality Pipelines Semantics Solid modeling Talking head Three-dimensional displays Virtual reality
ISSN:	2770-4254
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The majority of state-of-the-art audio-driven facial animation methods implement a differentiable rendering phase within their models, and as such, their output is a 2D raster image. However, existing development pipelines for MR (Mixed Reality) applications utilize platform-specific render engines optimized for specific HMDs (Head-mounted displays), which in turn necessitates the use of a technique that works directly on the facial mesh geometry. This work proposes an innovative lip-synchronized, audio-driven 3D face animation method utilizing a graph convolutional autoencoder that learns detailed facial deformations of a talking subject while generating a compact latent representation of the 3D model. The representation is later conditioned with the processed audio data to achieve synchronized lip and jaw movement while retaining the subject's facial features. The audio processing involves the extraction of semantic features, which strongly correlate with facial deformation and expression. Qualitative and quantitative experiments exhibit the method's potential usage in MR applications, as well as shed light on some of the disadvantages of the current approaches.
ISSN:	2770-4254
DOI:	10.1109/IDAACS58523.2023.10348935