InFER++: Real-World Indian Facial Expression Dataset

Detecting facial expressions is a challenging task in the field of computer vision. Several datasets and algorithms have been proposed over the past two decades; however, deploying them in real-world, in-the-wild scenarios hampers the overall performance. This is because the training data does not c...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE open journal of the Computer Society Ročník 5; s. 406 - 417
Hlavní autori: Rizvi, Syed Sameen Ahmad, Seth, Aryan, Challa, Jagat Sesh, Narang, Pratik
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2644-1268, 2644-1268
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Detecting facial expressions is a challenging task in the field of computer vision. Several datasets and algorithms have been proposed over the past two decades; however, deploying them in real-world, in-the-wild scenarios hampers the overall performance. This is because the training data does not completely represent socio-cultural and ethnic diversity; the majority of the datasets consist of American and Caucasian populations. On the contrary, in a diverse and heterogeneous population distribution like the Indian subcontinent, the need for a significantly large enough dataset representing all the ethnic groups is even more critical. To address this, we present InFER++, an India-specific, multi-ethnic, real-world, in-the-wild facial expression dataset consisting of seven basic expressions. To the best of our knowledge, this is the largest India-specific facial expression dataset. Our cross-dataset analysis of RAF-DB vs InFER++ shows that models trained on RAF-DB were not generalizable to ethnic datasets like InFER++. This is because the facial expressions change with respect to ethnic and socio-cultural factors. We also present LiteXpressionNet, a lightweight deep facial expression network that outperforms many existing lightweight models with considerably fewer FLOPs and parameters. The proposed model is inspired by MobileViTv2 architecture, which utilizes GhostNetv2 blocks to increase parametrization while reducing latency and FLOP requirements. The model is trained with a novel objective function that combines early learning regularization and symmetric cross-entropy loss to mitigate human uncertainties and annotation bias in most real-world facial expression datasets.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2644-1268
2644-1268
DOI:10.1109/OJCS.2024.3443511