Visual Commonsense Causal Reasoning From a Still Image

Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among c...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE access Ročník 13; s. 85084 - 85097
Hlavní autori:	Wu, Xiaojing, Guo, Rui, Li, Qin, Zhu, Ning
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Cause effect analysis Commonsense reasoning Correlation Electronic mail Estimation Large language models multimodal large language model Natural languages Question answering (information retrieval) Reasoning Visual commonsense reasoning Visual effects visual event reasoning Visualization
ISSN:	2169-3536, 2169-3536
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among common events in natural language descriptions. However, in real-world scenarios, CCR is fundamentally a multisensory task and is more susceptible to spurious correlations, given that commonsense causal relationships manifest in various modalities and involve multiple sources of confounders. In this work, to the best of our knowledge, we present the first comprehensive study focusing on visual commonsense causal reasoning (VCCR) within the potential outcomes framework. By drawing parallels between vision-language data and human subjects in the observational study, we tailor a foundational framework, VCC-Reasoner, for detecting implicit visual commonsense causation. It combines inverse propensity score weighting and outcome regression, offering dual robust estimates of the average treatment effect. Empirical evidence underscores the efficacy and superiority of VCC-Reasoner, showcasing its outstanding VCCR capabilities.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3558429