Vision Transformer and Residual Network-Based Autoencoder for RGBD Data Processing in Robotic Grasping of Noodle-Like Objects
In this innovative study, a Vision Transformer and Residual Network-based Autoencoder is employed for the efficient encoding of RGBD data, aimed at enhancing robotic precision in grasping noodle-like objects. The project successfully compresses 50x50 pixel RGBD images to a 1024-element format, optim...
Saved in:
| Published in: | 2024 1st International Conference on Robotics, Engineering, Science, and Technology (RESTCON) pp. 85 - 89 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
16.02.2024
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In this innovative study, a Vision Transformer and Residual Network-based Autoencoder is employed for the efficient encoding of RGBD data, aimed at enhancing robotic precision in grasping noodle-like objects. The project successfully compresses 50x50 pixel RGBD images to a 1024-element format, optimizing data processing for robotic applications. Utilizing a novel combination of vision transformers and residual networks, the autoencoder maintains critical data features during compression and decompression, essential for accurate robotic manipulation. The efficacy of this approach is evaluated using various metrics such as Relative Absolute Error (RAE), Relative Squared Error (RSE), Root Mean Square Error (RMSE), and accuracy thresholds. The results are promising, demonstrating high fidelity in the reconstructed data with accuracy thresholds at 1.25 reaching 0.993 for RGB images and 0.989 for depth images. These findings confirm the autoencoder's effectiveness in representing full data with excellent accuracy, underscoring its potential in robotic grasping applications, particularly for objects with complex shapes and textures. |
|---|---|
| DOI: | 10.1109/RESTCON60981.2024.10463551 |