On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive pe...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) s. 172 - 176
Hlavní autori: Van, Minh-Hao, Verma, Prateek, Wu, Xintao
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 19.06.2024
Predmet:
ISSN:2832-2975
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X- rays. While VLMs can not outperform classic vision models like CNN or ResNet, it is worth noting that VLMs can serve as chat assistants to provide pre-diagnosis before making decisions without the need for retraining or finetuning stages.
ISSN:2832-2975
DOI:10.1109/CHASE60773.2024.00029