Vision Transformer and Residual Network-Based Autoencoder for RGBD Data Processing in Robotic Grasping of Noodle-Like Objects
In this innovative study, a Vision Transformer and Residual Network-based Autoencoder is employed for the efficient encoding of RGBD data, aimed at enhancing robotic precision in grasping noodle-like objects. The project successfully compresses 50x50 pixel RGBD images to a 1024-element format, optim...
Uloženo v:
| Vydáno v: | 2024 1st International Conference on Robotics, Engineering, Science, and Technology (RESTCON) s. 85 - 89 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
16.02.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In this innovative study, a Vision Transformer and Residual Network-based Autoencoder is employed for the efficient encoding of RGBD data, aimed at enhancing robotic precision in grasping noodle-like objects. The project successfully compresses 50x50 pixel RGBD images to a 1024-element format, optimizing data processing for robotic applications. Utilizing a novel combination of vision transformers and residual networks, the autoencoder maintains critical data features during compression and decompression, essential for accurate robotic manipulation. The efficacy of this approach is evaluated using various metrics such as Relative Absolute Error (RAE), Relative Squared Error (RSE), Root Mean Square Error (RMSE), and accuracy thresholds. The results are promising, demonstrating high fidelity in the reconstructed data with accuracy thresholds at 1.25 reaching 0.993 for RGB images and 0.989 for depth images. These findings confirm the autoencoder's effectiveness in representing full data with excellent accuracy, underscoring its potential in robotic grasping applications, particularly for objects with complex shapes and textures. |
|---|---|
| DOI: | 10.1109/RESTCON60981.2024.10463551 |