FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-Pose, and Facial Expression Features

The task of face reenactment is to transfer the head motion and facial expressions from a driving video to the appearance of a source image, which may be of a different person (cross-reenactment). Most existing methods are CNN-based and estimate optical flow from the source image to the current driv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) S. 7716 - 7726
Hauptverfasser:	Rochow, Andre, Schwarz, Max, Behnke, Sven
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 16.06.2024
Schlagworte:	Animation Face recognition Face Reenactment Facial Animation Rendering (computer graphics) Shape Training Transformers Vectors
ISSN:	1063-6919
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	The task of face reenactment is to transfer the head motion and facial expressions from a driving video to the appearance of a source image, which may be of a different person (cross-reenactment). Most existing methods are CNN-based and estimate optical flow from the source image to the current driving frame, which is then inpainted and refined to produce the output animation. We propose a transformer-based encoder for computing a set-latent representation of the source image(s). We then predict the output color of a query pixel using a transformer-based decoder, which is conditioned with keypoints and a facial expression vector extracted from the driving frame. Latent representations of the source person are learned in a self-supervised manner that factorize their appearance, head pose, and facial expressions. Thus, they are perfectly suited for cross-reenactment. In contrast to most related work, our method naturally extends to multiple source images and can thus adapt to person-specific facial dynamics. We also propose data augmentation and regularization schemes that are necessary to prevent overfitting and support generalizability of the learned representations. We evaluated our approach in a randomized user study. The results indicate superior performance compared to the state-of-the-art in terms of motion transfer quality and temporal consistency. 1 1 Code & Video: https://andrerochow.github.io/fsrt
AbstractList	The task of face reenactment is to transfer the head motion and facial expressions from a driving video to the appearance of a source image, which may be of a different person (cross-reenactment). Most existing methods are CNN-based and estimate optical flow from the source image to the current driving frame, which is then inpainted and refined to produce the output animation. We propose a transformer-based encoder for computing a set-latent representation of the source image(s). We then predict the output color of a query pixel using a transformer-based decoder, which is conditioned with keypoints and a facial expression vector extracted from the driving frame. Latent representations of the source person are learned in a self-supervised manner that factorize their appearance, head pose, and facial expressions. Thus, they are perfectly suited for cross-reenactment. In contrast to most related work, our method naturally extends to multiple source images and can thus adapt to person-specific facial dynamics. We also propose data augmentation and regularization schemes that are necessary to prevent overfitting and support generalizability of the learned representations. We evaluated our approach in a randomized user study. The results indicate superior performance compared to the state-of-the-art in terms of motion transfer quality and temporal consistency. 1 1 Code & Video: https://andrerochow.github.io/fsrt
Author	Schwarz, Max Behnke, Sven Rochow, Andre
Author_xml	– sequence: 1 givenname: Andre surname: Rochow fullname: Rochow, Andre email: rochow@ais.uni-bonn.de organization: University of Bonn – sequence: 2 givenname: Max surname: Schwarz fullname: Schwarz, Max email: schwarz@ais.uni-bonn.de organization: University of Bonn – sequence: 3 givenname: Sven surname: Behnke fullname: Behnke, Sven email: behnke@cs.uni-bonn.de organization: University of Bonn
BookMark	eNo1kN1Kw0AQhVdRUGvfoBf7AKbuTzab9a6UxgoFS1u9LZPdCUSaTdhEUJ_Ax3aDejGcmeGbc2BuyIVvPRIy42zOOTP3y9ftTgkt5Vwwkc4Z01KfkanRJpeKSSUZy87JNWeZTDLDzRWZ9v0bY0wKzjOTX5PvYr87PNACbA0nurfoke6wC9ijH2CoW08PAXxftaHBQKOM7MigBzs0kaJVaJtxO7Sh_kJHF12HEI8s3tE1gku2bR9b8O4_Z_UxJvSje4EwvMfhllxWcOpx-qcT8lKsDst1snl-fFouNknNdTYkvExzIVHHyrUEk2KaGzDWpsI6UEaI0maa2RJL4SrunFCpy1PplLOl4iAnZPbrWyPisQt1A-HzGD-klFRa_gCWs2az
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR52733.2024.00737
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9798350353006
EISSN	1063-6919
EndPage	7726
ExternalDocumentID	10655357
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i176t-1b4823e723e873a94e489a9cc42cda5922bc670cbeb2df1dd254d843d5dcb51a3
IEDL.DBID	RIE
ISICitedReferencesCount	12
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001322555908012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:00:51 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i176t-1b4823e723e873a94e489a9cc42cda5922bc670cbeb2df1dd254d843d5dcb51a3
PageCount	11
ParticipantIDs	ieee_primary_10655357
PublicationCentury	2000
PublicationDate	2024-June-16
PublicationDateYYYYMMDD	2024-06-16
PublicationDate_xml	– month: 06 year: 2024 text: 2024-June-16 day: 16
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.3340416
Snippet	The task of face reenactment is to transfer the head motion and facial expressions from a driving video to the appearance of a source image, which may be of a...
SourceID	ieee
SourceType	Publisher
StartPage	7716
SubjectTerms	Animation Face recognition Face Reenactment Facial Animation Rendering (computer graphics) Shape Training Transformers Vectors
Title	FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-Pose, and Facial Expression Features
URI	https://ieeexplore.ieee.org/document/10655357
WOSCitedRecordID	wos001322555908012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcBUHkW85YERQx07ccxaNeqAqqgU1K3yq1KXFPWBEL-An82dmxYxMDBEcU5WTrJ9uUfuuyPk1gorubMpU1OlmHTGsBzULtOCB4ld4LSNQOEnNRjk47Eua7B6xMKEEGLyWbjHYfyX7-dujaEykPAsTUWqGqShVLYBa-0CKgJcmUznNTyOd_RD97UcYn0xAW5gEotki99NVKIOKVr_5H5I2j9oPFru9MwR2QvVMWnV5iOthXN5Qr6K5-HokRYGo-BAh68YHcZE1xpfVNHR1kwNCwo3nItzQmVczDaniDZBKpYO-YS3AxcQBTwZd7QPx4GV8yUMTeW3fHofdSptRdGcXMNDm7wUvVG3z-pGC2zGVbZi3Mo8EUHBlSthtAwy10Y7JxPnTaqTxLpMdZwFN9xPuffgVfpcCp962GZuxClpVvMqnBFqOtw5P7Ww4UYmBqwvmQoXDTurhM3OSRtXdvK2qaUx2S7qxR_0S3KAm4fJWTy7Is3VYh2uyb57X82Wi5t4Ar4BG4OzKg
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMFUHkW88cCIoY6dOGatGhVRqqgE1K3yq1KXFPWBEL-An83ZTYsYGBiiOCcrJ9m-3CP33SF0rZnm1OiYiJEQhBulSApql0hGHfdd4KQOQOGu6PXSwUDmFVg9YGGccyH5zN36YfiXbydm4UNlIOFJHLNYbKKtmIPjs4RrrUMqDJyZRKYVQI425V3rNe_7CmMMHMEolMlmv9uoBC2S1f_Jfw81fvB4OF9rmn204coDVK8MSFyJ5-wQfWXP_eIeZ8rHwYEO3zHcD6muFcKoxMXKUHVTDDc_189xpTIh3xx7vImn-uIhn_B24ALC4M_GDe7AgSD5ZAZDVdoVn_ZHlUxbYm9QLuChgV6ydtHqkKrVAhlTkcwJ1TyNmBNwpYIpyR1PpZLG8MhYFcso0iYRTaPBEbcjai34lTblzMYWNpoqdoRq5aR0xwirJjXGjjRsueKRAvuLx8wE004LppMT1PArO3xbVtMYrhb19A_6FdrpFE_dYfeh93iGdv1G-lQtmpyj2ny6cBdo27zPx7PpZTgN35aYtnE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=FSRT%3A+Facial+Scene+Representation+Transformer+for+Face+Reenactment+from+Factorized+Appearance%2C+Head-Pose%2C+and+Facial+Expression+Features&rft.au=Rochow%2C+Andre&rft.au=Schwarz%2C+Max&rft.au=Behnke%2C+Sven&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=7716&rft.epage=7726&rft_id=info:doi/10.1109%2FCVPR52733.2024.00737&rft.externalDocID=10655357