Suchergebnisse - "Multimodal Large Language Models"

  1. 1

    GSVA: Generalized Segmentation via Multimodal Large Language Models von Xia, Zhuofan, Han, Dongchen, Han, Yizeng, Pan, Xuran, Song, Shiji, Huang, Gao

    ISSN: 1063-6919
    Veröffentlicht: IEEE 16.06.2024
    “… Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks …”
    Volltext
    Tagungsbericht
  2. 2

    Will multimodal large language models ever achieve deep understanding of the world? von Farkaš, Igor, Vavrečka, Michal, Wermter, Stefan

    ISSN: 1662-5137, 1662-5137
    Veröffentlicht: Frontiers Media S.A 17.11.2025
    Veröffentlicht in Frontiers in systems neuroscience (17.11.2025)
    “… Despite impressive performance in various tasks, large language models (LLMs) are subject to the symbol grounding problem, so from the cognitive science …”
    Volltext
    Journal Article
  3. 3

    Can Multimodal Large Language Models Diagnose Diabetic Retinopathy from Fundus Photos? A Quantitative Evaluation von Most, Jesse A., Walker, Evan H., Mehta, Nehal N., Nagel, Ines D., Chen, Jimmy S., Russell, Jonathan F., Scott, Nathan L., Borooah, Shyamanga

    ISSN: 2666-9145, 2666-9145
    Veröffentlicht: Netherlands Elsevier Inc 01.01.2026
    Veröffentlicht in Ophthalmology science (Online) (01.01.2026)
    “… To evaluate the diagnostic accuracy of 4 multimodal large language models (MLLMs) in detecting and grading diabetic retinopathy …”
    Volltext
    Journal Article
  4. 4

    Analyzing the performance of multimodal large language models on visually-based questions in the Japanese National Examination for Dental Technicians von Mine, Yuichi, Taji, Tsuyoshi, Okazaki, Shota, Takeda, Saori, Peng, Tzu-Yu, Shimoe, Saiji, Kaku, Masato, Nikawa, Hiroki, Kakimoto, Naoya, Murayama, Takeshi

    ISSN: 1991-7902, 2213-8862, 2213-8862
    Veröffentlicht: Netherlands Elsevier B.V 01.10.2025
    Veröffentlicht in Journal of dental sciences (01.10.2025)
    “… Correct response rates were calculated overall, as well as by question type (text-only vs. visually-based) and subject …”
    Volltext
    Journal Article
  5. 5

    MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI von Yue, Xiang, Ni, Yuansheng, Zheng, Tianyu, Zhang, Kai, Liu, Ruoqi, Zhang, Ge, Stevens, Samuel, Jiang, Dongfu, Ren, Weiming, Sun, Yuxuan, Wei, Cong, Yu, Botao, Yuan, Ruibin, Sun, Renliang, Yin, Ming, Zheng, Boyuan, Yang, Zhenzhu, Liu, Yibo, Huang, Wenhao, Sun, Huan, Su, Yu, Chen, Wenhu

    ISSN: 1063-6919
    Veröffentlicht: IEEE 16.06.2024
    “… We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes …”
    Volltext
    Tagungsbericht
  6. 6

    Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models von Shekhawat, Ria, Li, Hailin, Ramachandra, Raghavendra, Venkatesh, Sushma

    ISSN: 2770-8330
    Veröffentlicht: IEEE 26.05.2025
    “… Leveraging the power of multimodal large language models (LLMs) offers a promising approach to enhancing the accuracy and interpretability of morphing attack detection (MAD …”
    Volltext
    Tagungsbericht
  7. 7

    GSVA: Generalized Segmentation via Multimodal Large Language Models von Xia, Zhuofan, Han, Dongchen, Han, Yizeng, Pan, Xuran, Song, Shiji, Huang, Gao

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 21.03.2024
    Veröffentlicht in arXiv.org (21.03.2024)
    “… Multimodal Large Language Models (MLLMs) have recently shown tremendous progress in these complicated vision-language tasks …”
    Volltext
    Paper
  8. 8

    Kosmos-G: Generating Images in Context with Multimodal Large Language Models von Pan, Xichen, Li, Dong, Huang, Shaohan, Peng, Zhiliang, Chen, Wenhu, Furu Wei

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 15.03.2024
    Veröffentlicht in arXiv.org (15.03.2024)
    “… " This paper presents Kosmos-G, a model that leverages the advanced multimodal perception capabilities of Multimodal Large Language Models (MLLMs …”
    Volltext
    Paper
  9. 9

    Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations von Pal, Ankit, Sankarasubbu, Malaikannan

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 10.02.2024
    Veröffentlicht in arXiv.org (10.02.2024)
    “… Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous …”
    Volltext
    Paper
  10. 10

    An fMRI visual neural encoding method with multimodal large language model von Ma, Shuxiao, Wang, Linyuan, Hou, Libin, Hou, Senbao, Yan, Bin

    ISSN: 0950-7051
    Veröffentlicht: Elsevier B.V 27.09.2025
    Veröffentlicht in Knowledge-based systems (27.09.2025)
    “… •In summary, our contributions are primarily threefold:.•To our knowledge, we establish the first multimodal framework combining MLLM with fMRI visual neural …”
    Volltext
    Journal Article
  11. 11

    Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT) von Tan, Jeremy C. K.

    ISSN: 2411-5150, 2411-5150
    Veröffentlicht: Switzerland MDPI AG 11.04.2025
    Veröffentlicht in Vision (Basel) (11.04.2025)
    “… Single-page anonymised VF test reports from 60 eyes of 60 subjects were analysed by an LLM (ChatGPT 4o …”
    Volltext
    Journal Article
  12. 12

    Art appreciation based on graph retrieval augmented generation and few-shot learning von LIU Tianyang, KOU Sijia, JIN Xu, WANG Wenjing, LU Xuesong

    ISSN: 2096-0271
    Veröffentlicht: China InfoCom Media Group 01.09.2025
    Veröffentlicht in 大数据 (01.09.2025)
    “… In this case, using multimodal large language models to tutor students in art appreciation has become a potential alternative …”
    Volltext
    Journal Article
  13. 13

    Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model von Jalili, Jalil, Gavhane, Yashraj, Walker, Evan, Heinke, Anna, Bowd, Christopher, Belghith, Akram, Fazio, Massimo A, Girkin, Christopher A, De Moraes, C Gustavo, Liebmann, Jeffrey M, Baxter, Sally L, Weinreb, Robert N, Zangwill, Linda M, Christopher, Mark

    ISSN: 2331-8422, 2331-8422
    Veröffentlicht: United States 01.10.2025
    Veröffentlicht in ArXiv.org (01.10.2025)
    “… To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH …”
    Volltext
    Journal Article
  14. 14

    Art appreciation based on graph retrieval augmented generation and few-shot learning von LIU Tianyang, KOU Sijia, JIN Xu, WANG Wenjing, LU Xuesong

    ISSN: 2096-0271
    Veröffentlicht: China InfoCom Media Group 01.01.2025
    Veröffentlicht in 大数据 (01.01.2025)
    “… In this case, using multimodal large language models to tutor students in art appreciation has become a potential alternative …”
    Volltext
    Journal Article
  15. 15

    Visual Commonsense Causal Reasoning From a Still Image von Wu, Xiaojing, Guo, Rui, Li, Qin, Zhu, Ning

    ISSN: 2169-3536, 2169-3536
    Veröffentlicht: Piscataway IEEE 2025
    Veröffentlicht in IEEE access (2025)
    “… Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending …”
    Volltext
    Journal Article
  16. 16

    Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis von Jalili, Jalil, Jiravarnsirikul, Anuwat, Bowd, Christopher, Chuter, Benton, Belghith, Akram, Goldbaum, Michael H., Baxter, Sally L., Weinreb, Robert N., Zangwill, Linda M., Christopher, Mark

    ISSN: 2666-9145, 2666-9145
    Veröffentlicht: Netherlands Elsevier Inc 01.03.2025
    Veröffentlicht in Ophthalmology science (Online) (01.03.2025)
    “… Evaluation of multimodal large language models for reviewing fundus images in glaucoma. A total of 300 fundus images from 3 public datasets …”
    Volltext
    Journal Article
  17. 17

    Deep Composer: Improving the String Quartet Music Generation Task von Galajda, Jacob Edward, Bach, Van Hoang Bao, Hua, Kien

    ISSN: 2770-4319
    Veröffentlicht: IEEE 06.08.2025
    “… Large Language Models (LLMs) have the capacity to create new pieces of music, however, generating high quality samples in specific genres and instrumentation …”
    Volltext
    Tagungsbericht
  18. 18

    Mv-Math: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts von Wang, Peijie, Li, Zhong-Zhi, Yin, Fei, Ran, Dekang, Liu, Cheng-Lin

    ISSN: 1063-6919
    Veröffentlicht: IEEE 10.06.2025
    “… Multimodal Large Language Models (MLLMs) have shown promising capabilities in mathematical reasoning within visual contexts across various datasets …”
    Volltext
    Tagungsbericht
  19. 19

    An autonomous AI agent for universal behavior analysis von Aljović, Almir, Lin, Zuwan, Wang, Wenbo, Zhang, Xinhe, Marin-Llobet, Arnau, Liang, Ningyue, Canales, Bradley, Lee, Jaeyong, Baek, Jongmin, Liu, Ren, Li, Catherine, Li, Na, Liu, Jia

    ISSN: 2692-8205, 2692-8205
    Veröffentlicht: United States Cold Spring Harbor Laboratory 20.05.2025
    Veröffentlicht in bioRxiv (20.05.2025)
    “… Unlike conventional methods that require manual behavior annotation, video segmentation, task-specific model training, BehaveAgent leverages the reasoning capabilities of multimodal large language models (LLM …”
    Volltext
    Journal Article Paper
  20. 20

    Instruction-based Image Manipulation by Watching How Things Move von Cao, Mingdeng, Zhang, Xuaner, Zheng, Yinqiang, Xia, Zhihao

    ISSN: 1063-6919
    Veröffentlicht: IEEE 10.06.2025
    “… This paper introduces a novel dataset construction pipeline that samples pairs of frames from videos and uses multimodal large language models (MLLMs …”
    Volltext
    Tagungsbericht