On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive pe...
Uložené v:
| Vydané v: | IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) s. 172 - 176 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
19.06.2024
|
| Predmet: | |
| ISSN: | 2832-2975 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X- rays. While VLMs can not outperform classic vision models like CNN or ResNet, it is worth noting that VLMs can serve as chat assistants to provide pre-diagnosis before making decisions without the need for retraining or finetuning stages. |
|---|---|
| AbstractList | Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X- rays. While VLMs can not outperform classic vision models like CNN or ResNet, it is worth noting that VLMs can serve as chat assistants to provide pre-diagnosis before making decisions without the need for retraining or finetuning stages. |
| Author | Van, Minh-Hao Verma, Prateek Wu, Xintao |
| Author_xml | – sequence: 1 givenname: Minh-Hao surname: Van fullname: Van, Minh-Hao email: haovan@uark.edu organization: University of Arkansas,Fayetteville,AR,USA – sequence: 2 givenname: Prateek surname: Verma fullname: Verma, Prateek email: prateek@uark.edu organization: University of Arkansas,Fayetteville,AR,USA – sequence: 3 givenname: Xintao surname: Wu fullname: Wu, Xintao email: xintaowu@uark.edu organization: University of Arkansas,Fayetteville,AR,USA |
| BookMark | eNotj1FLwzAUhaMoOGf_gUL_QOfNTbIkvo2yucHGhKmvI01uS6DrRrM97N9b1KdzDh8c-B7ZXXfsiLEXDhPOwb6Wy9luPgWtxQQB5QQA0N6wzGprhAIhFXBzy0ZoBBZotXpgWUqxAqVRKWvliH1su3zt-oby75gurh1G11zcsDfHQG3K62OfbyhEP7DVwTWxa_JZ59priultaPn8cIr9L96dL-H6xO5r1ybK_nPMvhbzz3JZrLfvq3K2LhxydS58pckY5ATCV6R8TZ5CBb5yGq00XhprveGBI5BBlKSCMYMBBMk1aCHG7PnvNxLR_tTHg-uvew5TLuXg-wPb11Gz |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CHASE60773.2024.00029 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350345018 |
| EISSN | 2832-2975 |
| EndPage | 176 |
| ExternalDocumentID | 10614428 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation grantid: 1946391 funderid: 10.13039/100000001 – fundername: University of Arkansas funderid: 10.13039/100007756 |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-a215t-cb7e8821e03cbe5cfecedb0cba72948c4899c81d120e8224e5d889750d4170733 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 19 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001294471900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:33:58 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a215t-cb7e8821e03cbe5cfecedb0cba72948c4899c81d120e8224e5d889750d4170733 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10614428 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-June-19 |
| PublicationDateYYYYMMDD | 2024-06-19 |
| PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-19 day: 19 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) |
| PublicationTitleAbbrev | CHASE |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib057255994 ssj0003204066 |
| Score | 2.089067 |
| Snippet | Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 172 |
| SubjectTerms | Analytical models Biological system modeling Brain modeling Magnetic resonance imaging medical imaging analysis Microscopy Robustness visual language model Visualization zero-shot learning |
| Title | On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study |
| URI | https://ieeexplore.ieee.org/document/10614428 |
| WOSCitedRecordID | wos001294471900019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1WPHhSseI3OXhdm-xmd7LepLRUKLWglt7KbjILhXZb-iH4751kt9WLB2_ZnJZJYN5M3nvD2IMogHAqUnUiZRaoMEwDrS0EEFllUURx7tmEoz4MBno8Toe1WN1rYRDRk8_w0S39W75dmK1rlbWq8iXUDdYASCqx1u7yxODNs9S-wRKFdD-TpFbtSJG22r3nt04iACKqC0Pnmi0csvw1VcUnle7JP3_nlDV_5Hl8uE88Z-wAy3M2fC1539G6-Wi63mYz-qg6kdyNO5utOaFTXj_L8Je5H07Ed5YkT7Tinfly6g1DuCMXfjXZR7fz3u4F9biEIKO8vQlMDkh4WVKETY6xKdCgzYXJMwLQShtFpZUheCpDgY47irHVOiXEYJUEN7vxgh2WixIvGScUVdgYZCYLrZSVOpPeqsskYHSax1es6cIxWVaOGJNdJK7_2L9hxy7ijmIl01t2uFlt8Y4dmc_NdL269-f4DcNSmm8 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cAasBMndthQ1aoVoVSiVN2qxL5Ildq06gcS_56zkxYWBqY4nqKLpXt3fu8dIfcsl4hTAasTzlNP-H7sKWWkJwMjDLAgzBybcJDIblcNh3GvEqs7LQwAOPIZPNilu8s3M722rbLHsnzx1S7ZC_HJSrnW5viE0tlniW2LJfDxhEZRpdvhLH5stJ_fmxGTMsDK0Le-2cxiy19zVVxaaR3984OOSf1HoEd729RzQnagOCW9t4ImlthNB-PlOp3gS9mLpHbg2WRJEZ_S6mKGdqZuPBHdmJI84Yo2p_Oxswyhll74VScfrWa_0faqgQleipl75elMAiJmjjHWGYQ6Bw0mYzpLEUILpQUWVxoBKvcZWPYohEapGDGDEVza6Y1npFbMCjgnFHFUbkLJU54rIQxXKXdmXTqSWsVZeEHqNhyjeemJMdpE4vKP_Tty0O6_JqOk0325Ioc2-pZwxeNrUlst1nBD9vXnarxc3Lp_-g3_Rp22 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+Conference+on+Connected+Health%3A+Applications%2C+Systems+and+Engineering+Technologies+%28Online%29&rft.atitle=On+Large+Visual+Language+Models+for+Medical+Imaging+Analysis%3A+An+Empirical+Study&rft.au=Van%2C+Minh-Hao&rft.au=Verma%2C+Prateek&rft.au=Wu%2C+Xintao&rft.date=2024-06-19&rft.pub=IEEE&rft.eissn=2832-2975&rft.spage=172&rft.epage=176&rft_id=info:doi/10.1109%2FCHASE60773.2024.00029&rft.externalDocID=10614428 |