Vision Transformer and Residual Network-Based Autoencoder for RGBD Data Processing in Robotic Grasping of Noodle-Like Objects

In this innovative study, a Vision Transformer and Residual Network-based Autoencoder is employed for the efficient encoding of RGBD data, aimed at enhancing robotic precision in grasping noodle-like objects. The project successfully compresses 50x50 pixel RGBD images to a 1024-element format, optim...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2024 1st International Conference on Robotics, Engineering, Science, and Technology (RESTCON) s. 85 - 89
Hlavní autoři:	Koomklang, Nattapat, Gamolped, Prem, Hayashi, Eiji
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 16.02.2024
Témata:	Autoencoder Data processing Deep Learning Grasping Image coding Measurement Residual Network RGBD Robot Grasping Service robots Transformers Transforms Vision Transformer
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In this innovative study, a Vision Transformer and Residual Network-based Autoencoder is employed for the efficient encoding of RGBD data, aimed at enhancing robotic precision in grasping noodle-like objects. The project successfully compresses 50x50 pixel RGBD images to a 1024-element format, optimizing data processing for robotic applications. Utilizing a novel combination of vision transformers and residual networks, the autoencoder maintains critical data features during compression and decompression, essential for accurate robotic manipulation. The efficacy of this approach is evaluated using various metrics such as Relative Absolute Error (RAE), Relative Squared Error (RSE), Root Mean Square Error (RMSE), and accuracy thresholds. The results are promising, demonstrating high fidelity in the reconstructed data with accuracy thresholds at 1.25 reaching 0.993 for RGB images and 0.989 for depth images. These findings confirm the autoencoder's effectiveness in representing full data with excellent accuracy, underscoring its potential in robotic grasping applications, particularly for objects with complex shapes and textures.
DOI:	10.1109/RESTCON60981.2024.10463551