Leveraging Large Language Models to Annotate Activities of Daily Living Captured with Egocentric Vision

Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications. However, engineering approaches for building such a classifier today requires the availability of large and rich annotated datasets representing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online) S. 182 - 186
Hauptverfasser: Shrestha, Sloke, Thomaz, Edison
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 19.06.2024
Schlagworte:
ISSN:2832-2975
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications. However, engineering approaches for building such a classifier today requires the availability of large and rich annotated datasets representing ADLs in a generalizable way. In this work, we evaluated state of the art large language models (LLMs) to perform fully-automated and assisted manual annotations of first-person images with ADLs. We performed automatic evaluations on four different vision language pipelines (VLPs): concept detector, concept detector + GPT-3.5, BLIP2, and GPT-4. Three of them were tested on 31,849 first person images and one of them, GPT-4, was tested on 3,446 images first person images. Among the four VLPs, BLIP2 scored the highest cosine similarity of 0.86. Furthermore, we evaluated assisted manual annotation with 20 participants who annotated 100 ADL images with recommended labels from three different VLPs. We show that annotation with BLIP2 assistance has highest pick rate of 0.698 and a subjective workload (NASA Task Load Index) score of 39.41 in a scale of 100. Despite limitations, our work demonstrates how large language model can be leveraged to optimize the difficult task of data annotation for building ADL classifiers.
AbstractList Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications. However, engineering approaches for building such a classifier today requires the availability of large and rich annotated datasets representing ADLs in a generalizable way. In this work, we evaluated state of the art large language models (LLMs) to perform fully-automated and assisted manual annotations of first-person images with ADLs. We performed automatic evaluations on four different vision language pipelines (VLPs): concept detector, concept detector + GPT-3.5, BLIP2, and GPT-4. Three of them were tested on 31,849 first person images and one of them, GPT-4, was tested on 3,446 images first person images. Among the four VLPs, BLIP2 scored the highest cosine similarity of 0.86. Furthermore, we evaluated assisted manual annotation with 20 participants who annotated 100 ADL images with recommended labels from three different VLPs. We show that annotation with BLIP2 assistance has highest pick rate of 0.698 and a subjective workload (NASA Task Load Index) score of 39.41 in a scale of 100. Despite limitations, our work demonstrates how large language model can be leveraged to optimize the difficult task of data annotation for building ADL classifiers.
Author Shrestha, Sloke
Thomaz, Edison
Author_xml – sequence: 1
  givenname: Sloke
  surname: Shrestha
  fullname: Shrestha, Sloke
  email: sloke@utexas.edu
  organization: University of Texas at Austin,Austin,TX,USA,78712
– sequence: 2
  givenname: Edison
  surname: Thomaz
  fullname: Thomaz, Edison
  email: ethomaz@utexas.edu
  organization: University of Texas at Austin,Austin,TX,USA,78712
BookMark eNotjF1LwzAYhaMoOOf-gUL-QGfefLTJZanTCRUv_LgdWfK2BmYy2myyf29Bb855OPCca3IRU0RC7oAtAZi5b9b126pkVSWWnHG5ZIwJOCMLUxktFBNSMdDnZMa14AU3lboii3EMW6YqrpQxckb6Fo842D7EnrZ26HHK2B_sBC_J426kOdE6xpRtRlq7HI4hBxxp6uiDDbsTbadlkhu7z4cBPf0J-Yuu-uQw5iE4-hnGkOINuezsbsTFf8_Jx-PqvVkX7evTc1O3heWgcmGtFJ5z760DbbUzHShk0jCsUDBddRwcgHRGcV8yX5ZbLq0HvfVCOSi1mJPbv9-AiJv9EL7tcNoAK0FKAPELYIha8g
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CHASE60773.2024.00031
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350345018
EISSN 2832-2975
EndPage 186
ExternalDocumentID 10614411
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a215t-aa43d22ddac18a8c9f15e0490e7e3087f21c114c952d60d66b24ad18bd35c1683
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001294471900021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:33:58 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a215t-aa43d22ddac18a8c9f15e0490e7e3087f21c114c952d60d66b24ad18bd35c1683
PageCount 5
ParticipantIDs ieee_primary_10614411
PublicationCentury 2000
PublicationDate 2024-June-19
PublicationDateYYYYMMDD 2024-06-19
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-19
  day: 19
PublicationDecade 2020
PublicationTitle IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (Online)
PublicationTitleAbbrev CHASE
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib057255994
ssj0003204066
Score 1.8734357
Snippet Developing a system that automatically and passively recognizes activities of daily living (ADLs) would be transformative for numerous health applications....
SourceID ieee
SourceType Publisher
StartPage 182
SubjectTerms Activities of Daily Living
Annotations
Buildings
Data Annotation
Detectors
Human Activity Recognition
Large language models
Manuals
NASA
Pipelines
Vison Language Models
Title Leveraging Large Language Models to Annotate Activities of Daily Living Captured with Egocentric Vision
URI https://ieeexplore.ieee.org/document/10614411
WOSCitedRecordID wos001294471900021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JasMwEBVt6LULTemODr26sSRbso8hTcghhEAXcguylmAIdshS6N93Rk7SXnroxRiDwGgGjd7MvHmEPCXesyJRJnLGZlFSMB8VuXCRFUZ5AByA33QQm1DjcTad5pMdWT1wYZxzofnMPeNrqOXb2mwxVdZp4AsyeY-Vkg1Za-88qQrDs5JDgkVw8E8pd6wdFued3rD72pexUgJwIcep2TFqy_1SVQlBZXD2z985J-0feh6dHALPBTly1SWZjxx4ZdAcoiNs74Znk4qkqHe2WNNNTbtVVePtknZNUI0AmExrT190ufiioxKTC7Snl1hVsBRTtLQ_r0P_ZmnoR6Cht8n7oP_WG0Y7FYVIQzjfRFonwnJurTYs05nJPUsd1vuccjgOECxiABSZPOVWxlbKgifasqywIjVMZuKKtKq6cteEwmrmJTMQ4IpEGDizuWPcwxowr8z4DWnjLs2WzaCM2X6Dbv_4fkdOOV45ccodtl-x_J60NquteyAn5nNTrlePwcbfABim-A
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF2kCnpSseK3e_Aam91NNskx1JaKsRSs0lvZ7EcJlKS0qeC_d2fTVi8evIQlsLDsDJm8mXnzEHoIjCF5EElPSxV7QU6MlydMe4rJyFjAYfGbcGIT0XAYTybJaENWd1wYrbVrPtOPsHS1fFXJNaTKOg18ASbvPkhn-Q1da-s-YeTGZwW7FAuj1kM53_B2iJ90uoP0rcf9KGIWGVKYm-2DutwvXRUXVvrH_zzQCWr_EPTwaBd6TtGeLs_QLNPWL53qEM6gwds-m2QkBsWz-QrXFU7LsoL_S5xKpxthgTKuDH4SxfwLZwWkF3BXLKCuoDAkaXFvVrkOzkLiD0dEb6P3fm_cHXgbHQVP2IBee0IETFGqlJAkFrFMDAk1VPx0pGEgoLWJtLBIJiFV3Fec5zQQisS5YqEkPGbnqFVWpb5A2O4mhhNpQ1weMGm_2lQTauwea2Ae00vUhluaLppRGdPtBV398f4eHQ7Gr9k0ex6-XKMjMAr0YZHkBrXq5VrfogP5WRer5Z0z9TeVKalA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+Conference+on+Connected+Health%3A+Applications%2C+Systems+and+Engineering+Technologies+%28Online%29&rft.atitle=Leveraging+Large+Language+Models+to+Annotate+Activities+of+Daily+Living+Captured+with+Egocentric+Vision&rft.au=Shrestha%2C+Sloke&rft.au=Thomaz%2C+Edison&rft.date=2024-06-19&rft.pub=IEEE&rft.eissn=2832-2975&rft.spage=182&rft.epage=186&rft_id=info:doi/10.1109%2FCHASE60773.2024.00031&rft.externalDocID=10614411