Leveraging Visual Captions for Enhanced Zero-Shot HOI Detection

Zero-shot Human-Object Interaction (HOI) detection aims to identify both seen and unseen HOI categories in an image. Most existing methods rely on semantic knowledge distilled from CLIP to find novel interactions but fail to fully exploit the powerful generalization ability of vision-language models...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) S. 1 - 5
Hauptverfasser:	Zeng, Yanqing, Mao, Yunyao, Lu, Zhenbo, Zhou, Wengang, Li, Houqiang
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 06.04.2025
Schlagworte:	Aggregates Benchmark testing Detectors Human-object Interaction Knowledge transfer Multimodal fusion Robustness Semantics Signal processing Source coding Speech processing Vision-language Model Visualization Zero-shot
ISSN:	2379-190X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!