Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations....

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) Ročník 2025; s. 26147 - 26159
Hlavní autoři:	Tang, Feilong, Liu, Chengzhi, Xu, Zhongxing, Hu, Ming, Huang, Zile, Xue, Haochen, Chen, Ziyang, Peng, Zelin, Yang, Zhiwei, Zhou, Sijin, Li, Wenxue, Li, Yulong, Song, Wenxuan, Su, Shiyan, Feng, Wei, Su, Jionglong, Lin, Minquan, Peng, Yifan, Cheng, Xuelian, Razzak, Imran, Ge, Zongyuan
Médium:	Konferenční příspěvek Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.06.2025
Témata:	Data mining Decoding Encoding FarSight demonstrates significant hallucination-mitigating performance across different MLLMs on both image and video benchmarks Heart Interference Large language models proving its effectiveness Question answering (information retrieval) Registers Video sequences Visualization With extensive experiments
ISSN:	1063-6919, 1063-6919
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!