A multimodal educational robots driven via dynamic attention

With the development of artificial intelligence and robotics technology, the application of educational robots in teaching is becoming increasingly popular. However, effectively evaluating and optimizing multimodal educational robots remains a challenge. This study introduces Res-ALBEF, a multimodal...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Frontiers in neurorobotics Ročník 18; s. 1453061
Hlavní autor:	Jianliang, An
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Switzerland Frontiers Media S.A 31.10.2024
Témata:	ALBEF dynamic attention mechanism educational multimodal robot Neuroscience VVG19 dynamic attention mechanism educational VVG19 multimodal robot ALBEF
ISSN:	1662-5218, 1662-5218
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	With the development of artificial intelligence and robotics technology, the application of educational robots in teaching is becoming increasingly popular. However, effectively evaluating and optimizing multimodal educational robots remains a challenge. This study introduces Res-ALBEF, a multimodal educational robot framework driven by dynamic attention. Res-ALBEF enhances the ALBEF (Align Before Fuse) method by incorporating residual connections to align visual and textual data more effectively before fusion. In addition, the model integrates a VGG19-based convolutional network for image feature extraction and utilizes a dynamic attention mechanism to dynamically focus on relevant parts of multimodal inputs. Our model was trained using a diverse dataset consisting of 50,000 multimodal educational instances, covering a variety of subjects and instructional content. The evaluation on an independent validation set of 10,000 samples demonstrated significant performance improvements: the model achieved an overall accuracy of 97.38% in educational content recognition. These results highlight the model's ability to improve alignment and fusion of multimodal information, making it a robust solution for multimodal educational robots.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Reviewed by: Tao Han, Hubei Normal University, China Fei Yan, Changchun University of Science and Technology, China Edited by: Xianmin Wang, Guangzhou University, China
ISSN:	1662-5218 1662-5218
DOI:	10.3389/fnbot.2024.1453061