Suchergebnisse - "Multimodal Large Language Models"
-
1
GSVA: Generalized Segmentation via Multimodal Large Language Models
ISSN: 1063-6919Veröffentlicht: IEEE 16.06.2024Veröffentlicht in Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) (16.06.2024)“… Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks …”
Volltext
Tagungsbericht -
2
Will multimodal large language models ever achieve deep understanding of the world?
ISSN: 1662-5137, 1662-5137Veröffentlicht: Frontiers Media S.A 17.11.2025Veröffentlicht in Frontiers in systems neuroscience (17.11.2025)“… Despite impressive performance in various tasks, large language models (LLMs) are subject to the symbol grounding problem, so from the cognitive science …”
Volltext
Journal Article -
3
Can Multimodal Large Language Models Diagnose Diabetic Retinopathy from Fundus Photos? A Quantitative Evaluation
ISSN: 2666-9145, 2666-9145Veröffentlicht: Netherlands Elsevier Inc 01.01.2026Veröffentlicht in Ophthalmology science (Online) (01.01.2026)“… To evaluate the diagnostic accuracy of 4 multimodal large language models (MLLMs) in detecting and grading diabetic retinopathy …”
Volltext
Journal Article -
4
Analyzing the performance of multimodal large language models on visually-based questions in the Japanese National Examination for Dental Technicians
ISSN: 1991-7902, 2213-8862, 2213-8862Veröffentlicht: Netherlands Elsevier B.V 01.10.2025Veröffentlicht in Journal of dental sciences (01.10.2025)“… Correct response rates were calculated overall, as well as by question type (text-only vs. visually-based) and subject …”
Volltext
Journal Article -
5
MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
ISSN: 1063-6919Veröffentlicht: IEEE 16.06.2024Veröffentlicht in Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) (16.06.2024)“… We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes …”
Volltext
Tagungsbericht -
6
Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models
ISSN: 2770-8330Veröffentlicht: IEEE 26.05.2025Veröffentlicht in IEEE International Conference and Workshops on Automatic Face and Gesture Recognition : FG (26.05.2025)“… Leveraging the power of multimodal large language models (LLMs) offers a promising approach to enhancing the accuracy and interpretability of morphing attack detection (MAD …”
Volltext
Tagungsbericht -
7
GSVA: Generalized Segmentation via Multimodal Large Language Models
ISSN: 2331-8422Veröffentlicht: Ithaca Cornell University Library, arXiv.org 21.03.2024Veröffentlicht in arXiv.org (21.03.2024)“… Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks …”
Volltext
Paper -
8
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
ISSN: 2331-8422Veröffentlicht: Ithaca Cornell University Library, arXiv.org 15.03.2024Veröffentlicht in arXiv.org (15.03.2024)“… " This paper presents Kosmos-G, a model that leverages the advanced multimodal perception capabilities of Multimodal Large Language Models (MLLMs …”
Volltext
Paper -
9
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
ISSN: 2331-8422Veröffentlicht: Ithaca Cornell University Library, arXiv.org 10.02.2024Veröffentlicht in arXiv.org (10.02.2024)“… Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous …”
Volltext
Paper -
10
An fMRI visual neural encoding method with multimodal large language model
ISSN: 0950-7051Veröffentlicht: Elsevier B.V 27.09.2025Veröffentlicht in Knowledge-based systems (27.09.2025)“… •In summary, our contributions are primarily threefold:.•To our knowledge, we establish the first multimodal framework combining MLLM with fMRI visual neural …”
Volltext
Journal Article -
11
Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
ISSN: 2411-5150, 2411-5150Veröffentlicht: Switzerland MDPI AG 11.04.2025Veröffentlicht in Vision (Basel) (11.04.2025)“… Single-page anonymised VF test reports from 60 eyes of 60 subjects were analysed by an LLM (ChatGPT 4o …”
Volltext
Journal Article -
12
Art appreciation based on graph retrieval augmented generation and few-shot learning
ISSN: 2096-0271Veröffentlicht: China InfoCom Media Group 01.09.2025Veröffentlicht in 大数据 (01.09.2025)“… In this case, using multimodal large language models to tutor students in art appreciation has become a potential alternative …”
Volltext
Journal Article -
13
Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model
ISSN: 2331-8422, 2331-8422Veröffentlicht: United States 01.10.2025Veröffentlicht in ArXiv.org (01.10.2025)“… To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH …”
Volltext
Journal Article -
14
Art appreciation based on graph retrieval augmented generation and few-shot learning
ISSN: 2096-0271Veröffentlicht: China InfoCom Media Group 01.01.2025Veröffentlicht in 大数据 (01.01.2025)“… In this case, using multimodal large language models to tutor students in art appreciation has become a potential alternative …”
Volltext
Journal Article -
15
Visual Commonsense Causal Reasoning From a Still Image
ISSN: 2169-3536, 2169-3536Veröffentlicht: Piscataway IEEE 2025Veröffentlicht in IEEE access (2025)“… Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending …”
Volltext
Journal Article -
16
Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis
ISSN: 2666-9145, 2666-9145Veröffentlicht: Netherlands Elsevier Inc 01.03.2025Veröffentlicht in Ophthalmology science (Online) (01.03.2025)“… Evaluation of multimodal large language models for reviewing fundus images in glaucoma. A total of 300 fundus images from 3 public datasets …”
Volltext
Journal Article -
17
Deep Composer: Improving the String Quartet Music Generation Task
ISSN: 2770-4319Veröffentlicht: IEEE 06.08.2025Veröffentlicht in Proceedings (IEEE Conference on Multimedia Information Processing and Retrieval. Online) (06.08.2025)“… Large Language Models (LLMs) have the capacity to create new pieces of music, however, generating high quality samples in specific genres and instrumentation …”
Volltext
Tagungsbericht -
18
Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
ISSN: 1063-6919Veröffentlicht: IEEE 10.06.2025Veröffentlicht in Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) (10.06.2025)“… Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets …”
Volltext
Tagungsbericht -
19
An autonomous AI agent for universal behavior analysis
ISSN: 2692-8205, 2692-8205Veröffentlicht: United States Cold Spring Harbor Laboratory 20.05.2025Veröffentlicht in bioRxiv (20.05.2025)“… Unlike conventional methods that require manual behavior annotation, video segmentation, task-specific model training, BehaveAgent leverages the reasoning capabilities of multimodal large language models (LLM …”
Volltext
Journal Article Paper -
20
Instruction-based Image Manipulation by Watching How Things Move
ISSN: 1063-6919Veröffentlicht: IEEE 10.06.2025Veröffentlicht in Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) (10.06.2025)“… This paper introduces a novel dataset construction pipeline that samples pairs of frames from videos and uses multimodal large language models (MLLMs …”
Volltext
Tagungsbericht