Action matters for object representation learning from sequences
Saved in:
| Title: | Action matters for object representation learning from sequences |
|---|---|
| Authors: | Sene, Mohamed, Massamba, Quinton, Jean-Charles, Armetta, Frédéric, Lefort, Mathieu |
| Contributors: | Statistique pour le Vivant et l’Homme (SVH), Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP), Université Grenoble Alpes (UGA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP), Université Grenoble Alpes (UGA), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), ANR-23-CE23-0021,MeSMRise,Apprentissage profond de représentations sensorimotrices multimodales(2023) |
| Source: | European Conference On Artificial Intelligence (ECAI), Artificial Intelligence and Cognition workshop (AIC) ; https://hal.science/hal-05324845 ; European Conference On Artificial Intelligence (ECAI), Artificial Intelligence and Cognition workshop (AIC), Oct 2025, Bologna, Italy ; https://ecai2025.org/ |
| Publisher Information: | CCSD |
| Publication Year: | 2025 |
| Collection: | Portail HAL de l'Université Lumière Lyon 2 |
| Subject Terms: | Tetris, Sequence modeling, Self-supervised learning, Representation learning, Sensorimotor theory, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [SCCO.COMP]Cognitive science/Computer science |
| Subject Geographic: | Bologna, Italy |
| Description: | International audience ; According to the sensorimotor contingencies theory, building from developments in cognitive sciences, action, and especially the rules governing the changes in sensations caused by action, are essential to construct meaningful representation of the environment. However, most of current approaches to self-supervised learning are currently learning static representations from large datasets. In this article, we study how state-of-the-art deep learning methods can learn sensorimotor representations from sequences of interactions in a simplified deterministic Tetris-like environment, partially observable by the model. Especially, we emphasize the role of action to facilitate the emergence of representations related to objects present in the environment (here Tetris manipulable shapes). To that purpose, several model configurations were compared: a multi-task Transformer, a multi-task State Space Model, and ablated variants omitting action either in the input and/or in the pretext tasks. All models were trained in a self-supervised setting using predictive tasks of a masked subpart of the input sequence. Results show that explicitly including actions in the input and in the pretext tasks significantly improves the quality of learned representations, particularly in ambiguous situations. These findings support the relevance of the sensorimotor framework for structuring representation learning. |
| Document Type: | conference object |
| Language: | English |
| Availability: | https://hal.science/hal-05324845 https://hal.science/hal-05324845v1/document https://hal.science/hal-05324845v1/file/Article_ECAI-AIC.pdf |
| Rights: | http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess |
| Accession Number: | edsbas.5A1B3E59 |
| Database: | BASE |
| Abstract: | International audience ; According to the sensorimotor contingencies theory, building from developments in cognitive sciences, action, and especially the rules governing the changes in sensations caused by action, are essential to construct meaningful representation of the environment. However, most of current approaches to self-supervised learning are currently learning static representations from large datasets. In this article, we study how state-of-the-art deep learning methods can learn sensorimotor representations from sequences of interactions in a simplified deterministic Tetris-like environment, partially observable by the model. Especially, we emphasize the role of action to facilitate the emergence of representations related to objects present in the environment (here Tetris manipulable shapes). To that purpose, several model configurations were compared: a multi-task Transformer, a multi-task State Space Model, and ablated variants omitting action either in the input and/or in the pretext tasks. All models were trained in a self-supervised setting using predictive tasks of a masked subpart of the input sequence. Results show that explicitly including actions in the input and in the pretext tasks significantly improves the quality of learned representations, particularly in ambiguous situations. These findings support the relevance of the sensorimotor framework for structuring representation learning. |
|---|
Nájsť tento článok vo Web of Science