Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Misinformation is now a major problem due to its poten-tial high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread vi-ral false stories. In this threat, a real image is re-purposed to supp...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 14920 - 14929
Hlavní autoři:	Abdelnabi, Sahar, Hasan, Rakibul, Fritz, Mario
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.06.2022
Témata:	categorization Cognition Computer architecture Computer vision Computer vision for social good; Recognition: detection Machine vision Manuals MIMICs retrieval; Vision + language; Vision applications and systems; Visual reasoning Visualization
ISSN:	1063-6919
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Misinformation is now a major problem due to its poten-tial high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread vi-ral false stories. In this threat, a real image is re-purposed to support other narratives by misrepresenting its context and/or elements. The internet is being used as the go-to way to verify information using different sources and modali-ties. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. To integrate evidence and cues from both modalities, we intro-duce the concept of 'multi-modal cycle-consistency check' starting from the image/caption, we gather tex-tual/visual evidence, which will be compared against the other paired caption/image, respectively. Moreover, we propose a novel architecture, Consistency-Checking Network (CCN), that mimics the layered human reasoning across the same and different modalities: the caption vs. textual evidence, the image vs. visual evidence, and the image vs. caption. Our work offers the first step and bench-mark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence 1 1 For code, checkpoints, and dataset, check: https://s-abdelnabi.github.io/OoC-multi-modal-fc/.
ISSN:	1063-6919
DOI:	10.1109/CVPR52688.2022.01452