Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations....

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) Vol. 2025; pp. 26147 - 26159
Main Authors:	Tang, Feilong, Liu, Chengzhi, Xu, Zhongxing, Hu, Ming, Huang, Zile, Xue, Haochen, Chen, Ziyang, Peng, Zelin, Yang, Zhiwei, Zhou, Sijin, Li, Wenxue, Li, Yulong, Song, Wenxuan, Su, Shiyan, Feng, Wei, Su, Jionglong, Lin, Minquan, Peng, Yifan, Cheng, Xuelian, Razzak, Imran, Ge, Zongyuan
Format:	Conference Proceeding Journal Article
Language:	English
Published:	United States IEEE 01.06.2025
Subjects:	Data mining Decoding Encoding FarSight demonstrates significant hallucination-mitigating performance across different MLLMs on both image and video benchmarks Heart Interference Large language models proving its effectiveness Question answering (information retrieval) Registers Video sequences Visualization With extensive experiments
ISSN:	1063-6919, 1063-6919
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!