CM-AVAE: Cross-Modal Adversarial Variational Autoencoder for Visual-to-Tactile Data Generation

Vibration acceleration signals allow humans to perceive the surface characteristics of textures during tool-surface interactions. However, acquiring acceleration signals requires a specialized system, which is relatively expensive. Conversely, visual images are more accessible than acceleration sign...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE robotics and automation letters Jg. 9; H. 6; S. 5214 - 5221
Hauptverfasser: Xi, Qiyuan, Wang, Fei, Tao, Liangze, Zhang, Hanjing, Jiang, Xun, Wu, Juan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 01.06.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2377-3766, 2377-3766
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Vibration acceleration signals allow humans to perceive the surface characteristics of textures during tool-surface interactions. However, acquiring acceleration signals requires a specialized system, which is relatively expensive. Conversely, visual images are more accessible than acceleration signals, and generative models can convert visual images into vibration acceleration signals. Utilizing generative models to generate vibration acceleration signals from visual data circumvents the need for time-consuming actual measurements. Furthermore, this approach can be applied to robot-related tasks. This letter presents a cross-modal adversarial variational autoencoder (CM-AVAE) for visual-to-tactile data generation. Our model incorporates latent space learning from variational autoencoders (VAEs) into generative adversarial networks (GANs) and maps the generator's decoder feature vectors to the discriminator. In addition, a public dataset is chosen to train the model, and relevant evaluation metrics are established to evaluate the model's generated results. The results generated by the CM-AVAE model show significant improvement in objective experiments compared to the baseline models. Furthermore, subjective experimental outcomes also surpass those of the baseline models. Ablation study shows that CM-AVAE introduces latent space learning and maps the decoder feature vectors in the generator to the discriminator, significantly improving the quality of cross-modal data generation.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2377-3766
2377-3766
DOI:10.1109/LRA.2024.3387146