On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive pe...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) s. 172 - 176
Hlavní autori: Van, Minh-Hao, Verma, Prateek, Wu, Xintao
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 19.06.2024
Predmet:
ISSN:2832-2975
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X- rays. While VLMs can not outperform classic vision models like CNN or ResNet, it is worth noting that VLMs can serve as chat assistants to provide pre-diagnosis before making decisions without the need for retraining or finetuning stages.
AbstractList Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X- rays. While VLMs can not outperform classic vision models like CNN or ResNet, it is worth noting that VLMs can serve as chat assistants to provide pre-diagnosis before making decisions without the need for retraining or finetuning stages.
Author Van, Minh-Hao
Verma, Prateek
Wu, Xintao
Author_xml – sequence: 1
  givenname: Minh-Hao
  surname: Van
  fullname: Van, Minh-Hao
  email: haovan@uark.edu
  organization: University of Arkansas,Fayetteville,AR,USA
– sequence: 2
  givenname: Prateek
  surname: Verma
  fullname: Verma, Prateek
  email: prateek@uark.edu
  organization: University of Arkansas,Fayetteville,AR,USA
– sequence: 3
  givenname: Xintao
  surname: Wu
  fullname: Wu, Xintao
  email: xintaowu@uark.edu
  organization: University of Arkansas,Fayetteville,AR,USA
BookMark eNotj1FLwzAUhaMoOGf_gUL_QOfNTbIkvo2yucHGhKmvI01uS6DrRrM97N9b1KdzDh8c-B7ZXXfsiLEXDhPOwb6Wy9luPgWtxQQB5QQA0N6wzGprhAIhFXBzy0ZoBBZotXpgWUqxAqVRKWvliH1su3zt-oby75gurh1G11zcsDfHQG3K62OfbyhEP7DVwTWxa_JZ59priultaPn8cIr9L96dL-H6xO5r1ybK_nPMvhbzz3JZrLfvq3K2LhxydS58pckY5ATCV6R8TZ5CBb5yGq00XhprveGBI5BBlKSCMYMBBMk1aCHG7PnvNxLR_tTHg-uvew5TLuXg-wPb11Gz
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CHASE60773.2024.00029
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350345018
EISSN 2832-2975
EndPage 176
ExternalDocumentID 10614428
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: 1946391
  funderid: 10.13039/100000001
– fundername: University of Arkansas
  funderid: 10.13039/100007756
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a215t-cb7e8821e03cbe5cfecedb0cba72948c4899c81d120e8224e5d889750d4170733
IEDL.DBID RIE
ISICitedReferencesCount 19
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001294471900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:33:58 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a215t-cb7e8821e03cbe5cfecedb0cba72948c4899c81d120e8224e5d889750d4170733
PageCount 5
ParticipantIDs ieee_primary_10614428
PublicationCentury 2000
PublicationDate 2024-June-19
PublicationDateYYYYMMDD 2024-06-19
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-19
  day: 19
PublicationDecade 2020
PublicationTitle IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online)
PublicationTitleAbbrev CHASE
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib057255994
ssj0003204066
Score 2.089067
Snippet Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to...
SourceID ieee
SourceType Publisher
StartPage 172
SubjectTerms Analytical models
Biological system modeling
Brain modeling
Magnetic resonance imaging
medical imaging analysis
Microscopy
Robustness
visual language model
Visualization
zero-shot learning
Title On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
URI https://ieeexplore.ieee.org/document/10614428
WOSCitedRecordID wos001294471900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1WPHhSseI3OXhdm-xmd7LepLRUKLWglt7KbjILhXZb-iH4751kt9WLB2_ZnJZJYN5M3nvD2IMogHAqUnUiZRaoMEwDrS0EEFllUURx7tmEoz4MBno8Toe1WN1rYRDRk8_w0S39W75dmK1rlbWq8iXUDdYASCqx1u7yxODNs9S-wRKFdD-TpFbtSJG22r3nt04iACKqC0Pnmi0csvw1VcUnle7JP3_nlDV_5Hl8uE88Z-wAy3M2fC1539G6-Wi63mYz-qg6kdyNO5utOaFTXj_L8Je5H07Ed5YkT7Tinfly6g1DuCMXfjXZR7fz3u4F9biEIKO8vQlMDkh4WVKETY6xKdCgzYXJMwLQShtFpZUheCpDgY47irHVOiXEYJUEN7vxgh2WixIvGScUVdgYZCYLrZSVOpPeqsskYHSax1es6cIxWVaOGJNdJK7_2L9hxy7ijmIl01t2uFlt8Y4dmc_NdL269-f4DcNSmm8
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cAasBMndthQ1aoVoVSiVN2qxL5Ildq06gcS_56zkxYWBqY4nqKLpXt3fu8dIfcsl4hTAasTzlNP-H7sKWWkJwMjDLAgzBybcJDIblcNh3GvEqs7LQwAOPIZPNilu8s3M722rbLHsnzx1S7ZC_HJSrnW5viE0tlniW2LJfDxhEZRpdvhLH5stJ_fmxGTMsDK0Le-2cxiy19zVVxaaR3984OOSf1HoEd729RzQnagOCW9t4ImlthNB-PlOp3gS9mLpHbg2WRJEZ_S6mKGdqZuPBHdmJI84Yo2p_Oxswyhll74VScfrWa_0faqgQleipl75elMAiJmjjHWGYQ6Bw0mYzpLEUILpQUWVxoBKvcZWPYohEapGDGDEVza6Y1npFbMCjgnFHFUbkLJU54rIQxXKXdmXTqSWsVZeEHqNhyjeemJMdpE4vKP_Tty0O6_JqOk0325Ioc2-pZwxeNrUlst1nBD9vXnarxc3Lp_-g3_Rp22
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+Conference+on+Connected+Health%3A+Applications%2C+Systems+and+Engineering+Technologies+%28Online%29&rft.atitle=On+Large+Visual+Language+Models+for+Medical+Imaging+Analysis%3A+An+Empirical+Study&rft.au=Van%2C+Minh-Hao&rft.au=Verma%2C+Prateek&rft.au=Wu%2C+Xintao&rft.date=2024-06-19&rft.pub=IEEE&rft.eissn=2832-2975&rft.spage=172&rft.epage=176&rft_id=info:doi/10.1109%2FCHASE60773.2024.00029&rft.externalDocID=10614428