CVCPSG: Discovering Composite Visual Clues for Panoptic Scene Graph Generation

Panoptic Scene Graph Generation (PSG) aims to segment objects and predict the relation triplets <subject, relation, object> within an image. Despite the impressive achievements in PSG, current methods still struggle to capture fine-grained visual context, eschewing spatial and situational info...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of King Saud University. Computer and information sciences Ročník 37; číslo 4; s. 49 - 13
Hlavní autoři: Liang, Nanhao, Yang, Xiaoyuan, Xia, Yingwei, Liu, Yong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Cham Springer International Publishing 01.06.2025
Springer Nature B.V
Springer
Témata:
ISSN:1319-1578, 2213-1248, 1319-1578
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Panoptic Scene Graph Generation (PSG) aims to segment objects and predict the relation triplets <subject, relation, object> within an image. Despite the impressive achievements in PSG, current methods still struggle to capture fine-grained visual context, eschewing spatial and situational information in favor of visual features related to object identity. This limitation naturally impedes the model’s ability to distinguish subtle visual differences between relation triplets, such as “cat-on-person” and “cat-lying on-person”. To address this challenge, we propose CVCPSG, a novel DETR-based method that uncovers composite visual clues for PSG. Specifically, drawing inspiration from how humans capture visual context using diverse visual clues, we first construct a composite visual clues bank based on three key aspects: object, spatial, and situational. Then, we introduce a multi-level visual extractor to align visual features from objects, interactions, and image levels with the composite visual clues bank. Additionally, we incorporate a cross-modal learning module with a multitower architecture to seamlessly integrate visual clues into the relation decoder, thereby improving PSG detection. Extensive experiments on two PSG benchmarks confirm the effectiveness and interpretability of CVCPSG.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1319-1578
2213-1248
1319-1578
DOI:10.1007/s44443-025-00063-w