A vector quantized masked autoencoder for audiovisual speech emotion recognition

An important challenge in emotion recognition is to develop methods that can leverage unlabeled training data. In this paper, we propose the VQ-MAE-AV model, a self-supervised multimodal model that leverages masked autoencoders to learn representations of audiovisual speech without labels. The model...

Full description

Saved in:
Bibliographic Details
Published in:Computer vision and image understanding Vol. 257; p. 104362
Main Authors: Sadok, Samir, Leglaive, Simon, Séguier, Renaud
Format: Journal Article
Language:English
Published: Elsevier Inc 01.06.2025
Elsevier
Subjects:
ISSN:1077-3142, 1090-235X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first