A vector quantized masked autoencoder for audiovisual speech emotion recognition
An important challenge in emotion recognition is to develop methods that can leverage unlabeled training data. In this paper, we propose the VQ-MAE-AV model, a self-supervised multimodal model that leverages masked autoencoders to learn representations of audiovisual speech without labels. The model...
Saved in:
| Published in: | Computer vision and image understanding Vol. 257; p. 104362 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Inc
01.06.2025
Elsevier |
| Subjects: | |
| ISSN: | 1077-3142, 1090-235X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!