GelSight dual-modal tactile data compression for machines

Existing image coding for machines methods usually optimize jointly with downstream tasks and transmit task-relevant information in a lossy manner, but they cannot meet the differentiated needs of the dual-modal information in GelSight tactile images. To this end, we propose an end-to-end dual-modal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Digital signal processing Jg. 168; S. 105696
Hauptverfasser: Zeng, Yaofeng, Lan, Chengdong, Xu, Yifeng
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Inc 01.01.2026
Schlagworte:
ISSN:1051-2004
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Existing image coding for machines methods usually optimize jointly with downstream tasks and transmit task-relevant information in a lossy manner, but they cannot meet the differentiated needs of the dual-modal information in GelSight tactile images. To this end, we propose an end-to-end dual-modal tactile data compression framework that integrates lossy and lossless strategies for differentiated transmission. For the shape-texture modality, we address the feature mismatch problem that occurs when the task inference subnet changes during decoding by proposing a Feature-Semantics Preserving Multi-Branch Decoder (FSPMBD). This decoder reconstructs multi-level semantic features through combinations of different branches and aligns them with a pretrained tactile task model, thereby ensuring semantic consistency of the decoded features with downstream tasks. With this design, the framework can flexibly adapt to different task inference subnets without the need to retrain or store multiple models. On the other hand, to more effectively eliminate the statistical redundancy in the force modality and achieve its lossless transmission at lower bitrates, we build a dual-branch entropy model that leverages the distribution of background and force marker pixels, achieving more accurate probability modeling. Experiments show that our method enables differentiated transmission of dual-modal information and, in material classification, reduces the bitrate to 8.3 %-33.3 % of existing methods while maintaining performance comparable to state-of-the-art approaches.
ISSN:1051-2004
DOI:10.1016/j.dsp.2025.105696