Leveraging Large Language Models to Annotate Activities of Daily Living Captured with Egocentric Vision
Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications. However, engineering approaches for building such a classifier today requires the availability of large and rich annotated datasets representing...
Gespeichert in:
| Veröffentlicht in: | IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) S. 182 - 186 |
|---|---|
| Hauptverfasser: | , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
19.06.2024
|
| Schlagworte: | |
| ISSN: | 2832-2975 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications. However, engineering approaches for building such a classifier today requires the availability of large and rich annotated datasets representing ADLs in a generalizable way. In this work, we evaluated state of the art large language models (LLMs) to perform fully-automated and assisted manual annotations of first-person images with ADLs. We performed automatic evaluations on four different vision language pipelines (VLPs): concept detector, concept detector + GPT-3.5, BLIP2, and GPT-4. Three of them were tested on 31,849 first person images and one of them, GPT-4, was tested on 3,446 images first person images. Among the four VLPs, BLIP2 scored the highest cosine similarity of 0.86. Furthermore, we evaluated assisted manual annotation with 20 participants who annotated 100 ADL images with recommended labels from three different VLPs. We show that annotation with BLIP2 assistance has highest pick rate of 0.698 and a subjective workload (NASA Task Load Index) score of 39.41 in a scale of 100. Despite limitations, our work demonstrates how large language model can be leveraged to optimize the difficult task of data annotation for building ADL classifiers. |
|---|---|
| AbstractList | Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications. However, engineering approaches for building such a classifier today requires the availability of large and rich annotated datasets representing ADLs in a generalizable way. In this work, we evaluated state of the art large language models (LLMs) to perform fully-automated and assisted manual annotations of first-person images with ADLs. We performed automatic evaluations on four different vision language pipelines (VLPs): concept detector, concept detector + GPT-3.5, BLIP2, and GPT-4. Three of them were tested on 31,849 first person images and one of them, GPT-4, was tested on 3,446 images first person images. Among the four VLPs, BLIP2 scored the highest cosine similarity of 0.86. Furthermore, we evaluated assisted manual annotation with 20 participants who annotated 100 ADL images with recommended labels from three different VLPs. We show that annotation with BLIP2 assistance has highest pick rate of 0.698 and a subjective workload (NASA Task Load Index) score of 39.41 in a scale of 100. Despite limitations, our work demonstrates how large language model can be leveraged to optimize the difficult task of data annotation for building ADL classifiers. |
| Author | Shrestha, Sloke Thomaz, Edison |
| Author_xml | – sequence: 1 givenname: Sloke surname: Shrestha fullname: Shrestha, Sloke email: sloke@utexas.edu organization: University of Texas at Austin,Austin,TX,USA,78712 – sequence: 2 givenname: Edison surname: Thomaz fullname: Thomaz, Edison email: ethomaz@utexas.edu organization: University of Texas at Austin,Austin,TX,USA,78712 |
| BookMark | eNotjF1LwzAYhaMoOOf-gUL-QGfefLTJZanTCRUv_LgdWfK2BmYy2myyf29Bb855OPCca3IRU0RC7oAtAZi5b9b126pkVSWWnHG5ZIwJOCMLUxktFBNSMdDnZMa14AU3lboii3EMW6YqrpQxckb6Fo842D7EnrZ26HHK2B_sBC_J426kOdE6xpRtRlq7HI4hBxxp6uiDDbsTbadlkhu7z4cBPf0J-Yuu-uQw5iE4-hnGkOINuezsbsTFf8_Jx-PqvVkX7evTc1O3heWgcmGtFJ5z760DbbUzHShk0jCsUDBddRwcgHRGcV8yX5ZbLq0HvfVCOSi1mJPbv9-AiJv9EL7tcNoAK0FKAPELYIha8g |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CHASE60773.2024.00031 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350345018 |
| EISSN | 2832-2975 |
| EndPage | 186 |
| ExternalDocumentID | 10614411 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-a215t-aa43d22ddac18a8c9f15e0490e7e3087f21c114c952d60d66b24ad18bd35c1683 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001294471900021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:33:58 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a215t-aa43d22ddac18a8c9f15e0490e7e3087f21c114c952d60d66b24ad18bd35c1683 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10614411 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-June-19 |
| PublicationDateYYYYMMDD | 2024-06-19 |
| PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-19 day: 19 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) |
| PublicationTitleAbbrev | CHASE |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib057255994 ssj0003204066 |
| Score | 1.8734357 |
| Snippet | Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 182 |
| SubjectTerms | Activities of Daily Living Annotations Buildings Data Annotation Detectors Human Activity Recognition Large language models Manuals NASA Pipelines Vison Language Models |
| Title | Leveraging Large Language Models to Annotate Activities of Daily Living Captured with Egocentric Vision |
| URI | https://ieeexplore.ieee.org/document/10614411 |
| WOSCitedRecordID | wos001294471900021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JasMwEBVt6LULTemODr26sSRbso8hTcghhEAXcguylmAIdshS6N93Rk7SXnroxRiDwGgGjd7MvHmEPCXesyJRJnLGZlFSMB8VuXCRFUZ5AByA33QQm1DjcTad5pMdWT1wYZxzofnMPeNrqOXb2mwxVdZp4AsyeY-Vkg1Za-88qQrDs5JDgkVw8E8pd6wdFued3rD72pexUgJwIcep2TFqy_1SVQlBZXD2z985J-0feh6dHALPBTly1SWZjxx4ZdAcoiNs74Znk4qkqHe2WNNNTbtVVePtknZNUI0AmExrT190ufiioxKTC7Snl1hVsBRTtLQ_r0P_ZmnoR6Cht8n7oP_WG0Y7FYVIQzjfRFonwnJurTYs05nJPUsd1vuccjgOECxiABSZPOVWxlbKgifasqywIjVMZuKKtKq6cteEwmrmJTMQ4IpEGDizuWPcwxowr8z4DWnjLs2WzaCM2X6Dbv_4fkdOOV45ccodtl-x_J60NquteyAn5nNTrlePwcbfABim-A |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF2kCnpSseK3e_Aam91NNskx1JaKsRSs0lvZ7EcJlKS0qeC_d2fTVi8evIQlsLDsDJm8mXnzEHoIjCF5EElPSxV7QU6MlydMe4rJyFjAYfGbcGIT0XAYTybJaENWd1wYrbVrPtOPsHS1fFXJNaTKOg18ASbvPkhn-Q1da-s-YeTGZwW7FAuj1kM53_B2iJ90uoP0rcf9KGIWGVKYm-2DutwvXRUXVvrH_zzQCWr_EPTwaBd6TtGeLs_QLNPWL53qEM6gwds-m2QkBsWz-QrXFU7LsoL_S5xKpxthgTKuDH4SxfwLZwWkF3BXLKCuoDAkaXFvVrkOzkLiD0dEb6P3fm_cHXgbHQVP2IBee0IETFGqlJAkFrFMDAk1VPx0pGEgoLWJtLBIJiFV3Fec5zQQisS5YqEkPGbnqFVWpb5A2O4mhhNpQ1weMGm_2lQTauwea2Ae00vUhluaLppRGdPtBV398f4eHQ7Gr9k0ex6-XKMjMAr0YZHkBrXq5VrfogP5WRer5Z0z9TeVKalA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+Conference+on+Connected+Health%3A+Applications%2C+Systems+and+Engineering+Technologies+%28Online%29&rft.atitle=Leveraging+Large+Language+Models+to+Annotate+Activities+of+Daily+Living+Captured+with+Egocentric+Vision&rft.au=Shrestha%2C+Sloke&rft.au=Thomaz%2C+Edison&rft.date=2024-06-19&rft.pub=IEEE&rft.eissn=2832-2975&rft.spage=182&rft.epage=186&rft_id=info:doi/10.1109%2FCHASE60773.2024.00031&rft.externalDocID=10614411 |