CVCPSG: Discovering Composite Visual Clues for Panoptic Scene Graph Generation
Panoptic Scene Graph Generation (PSG) aims to segment objects and predict the relation triplets <subject, relation, object> within an image. Despite the impressive achievements in PSG, current methods still struggle to capture fine-grained visual context, eschewing spatial and situational info...
Uloženo v:
| Vydáno v: | Journal of King Saud University. Computer and information sciences Ročník 37; číslo 4; s. 49 - 13 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Cham
Springer International Publishing
01.06.2025
Springer Nature B.V Springer |
| Témata: | |
| ISSN: | 1319-1578, 2213-1248, 1319-1578 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Panoptic Scene Graph Generation (PSG) aims to segment objects and predict the relation triplets <subject, relation, object> within an image. Despite the impressive achievements in PSG, current methods still struggle to capture fine-grained visual context, eschewing spatial and situational information in favor of visual features related to object identity. This limitation naturally impedes the model’s ability to distinguish subtle visual differences between relation triplets, such as “cat-on-person” and “cat-lying on-person”. To address this challenge, we propose CVCPSG, a novel DETR-based method that uncovers composite visual clues for PSG. Specifically, drawing inspiration from how humans capture visual context using diverse visual clues, we first construct a composite visual clues bank based on three key aspects: object, spatial, and situational. Then, we introduce a multi-level visual extractor to align visual features from objects, interactions, and image levels with the composite visual clues bank. Additionally, we incorporate a cross-modal learning module with a multitower architecture to seamlessly integrate visual clues into the relation decoder, thereby improving PSG detection. Extensive experiments on two PSG benchmarks confirm the effectiveness and interpretability of CVCPSG. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1319-1578 2213-1248 1319-1578 |
| DOI: | 10.1007/s44443-025-00063-w |