HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (3.7M+ images) of recordings that feature 19 subjects interacting with 33 diverse rigid objects. In addition to simple pickup, observe, and put-down actions, the subjec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) S. 7061 - 7071
Hauptverfasser: Banerjee, Prithviraj, Shkodrani, Sindi, Moulon, Pierre, Hampali, Shreyas, Han, Shangchen, Zhang, Fan, Zhang, Linguang, Fountain, Jade, Miller, Edward, Basol, Selen, Newcombe, Richard, Wang, Robert, Engel, Jakob Julian, Hodan, Tomas
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 10.06.2025
Schlagworte:
ISSN:1063-6919
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (3.7M+ images) of recordings that feature 19 subjects interacting with 33 diverse rigid objects. In addition to simple pickup, observe, and put-down actions, the subjects perform actions typical for a kitchen, office, and living room environment. The recordings include multiple synchronized data streams containing egocentric multi-view RGB/monochrome images, eye gaze signal, scene point clouds, and 3D poses of cameras, hands, and objects. The dataset is recorded with two headsets from Meta: Project Aria, which is a research prototype of AI glasses, and Quest 3, a virtual-reality headset that has shipped millions of units. Ground-truth poses were obtained by a motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats, and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, model-based 6DoF object pose estimation, and 3D lifting of unknown in-hand objects. The evaluated multi-view methods, whose benchmarking is uniquely enabled by HOT3D, significantly outperform their single-view counterparts.
AbstractList We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (3.7M+ images) of recordings that feature 19 subjects interacting with 33 diverse rigid objects. In addition to simple pickup, observe, and put-down actions, the subjects perform actions typical for a kitchen, office, and living room environment. The recordings include multiple synchronized data streams containing egocentric multi-view RGB/monochrome images, eye gaze signal, scene point clouds, and 3D poses of cameras, hands, and objects. The dataset is recorded with two headsets from Meta: Project Aria, which is a research prototype of AI glasses, and Quest 3, a virtual-reality headset that has shipped millions of units. Ground-truth poses were obtained by a motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats, and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, model-based 6DoF object pose estimation, and 3D lifting of unknown in-hand objects. The evaluated multi-view methods, whose benchmarking is uniquely enabled by HOT3D, significantly outperform their single-view counterparts.
Author Shkodrani, Sindi
Moulon, Pierre
Fountain, Jade
Hodan, Tomas
Han, Shangchen
Wang, Robert
Zhang, Fan
Banerjee, Prithviraj
Engel, Jakob Julian
Basol, Selen
Hampali, Shreyas
Zhang, Linguang
Newcombe, Richard
Miller, Edward
Author_xml – sequence: 1
  givenname: Prithviraj
  surname: Banerjee
  fullname: Banerjee, Prithviraj
  organization: Meta Reality Labs
– sequence: 2
  givenname: Sindi
  surname: Shkodrani
  fullname: Shkodrani, Sindi
  organization: Meta Reality Labs
– sequence: 3
  givenname: Pierre
  surname: Moulon
  fullname: Moulon, Pierre
  organization: Meta Reality Labs
– sequence: 4
  givenname: Shreyas
  surname: Hampali
  fullname: Hampali, Shreyas
  organization: Meta Reality Labs
– sequence: 5
  givenname: Shangchen
  surname: Han
  fullname: Han, Shangchen
  organization: Meta Reality Labs
– sequence: 6
  givenname: Fan
  surname: Zhang
  fullname: Zhang, Fan
  organization: Meta Reality Labs
– sequence: 7
  givenname: Linguang
  surname: Zhang
  fullname: Zhang, Linguang
  organization: Meta Reality Labs
– sequence: 8
  givenname: Jade
  surname: Fountain
  fullname: Fountain, Jade
  organization: Meta Reality Labs
– sequence: 9
  givenname: Edward
  surname: Miller
  fullname: Miller, Edward
  organization: Meta Reality Labs
– sequence: 10
  givenname: Selen
  surname: Basol
  fullname: Basol, Selen
  organization: Meta Reality Labs
– sequence: 11
  givenname: Richard
  surname: Newcombe
  fullname: Newcombe, Richard
  organization: Meta Reality Labs
– sequence: 12
  givenname: Robert
  surname: Wang
  fullname: Wang, Robert
  organization: Meta Reality Labs
– sequence: 13
  givenname: Jakob Julian
  surname: Engel
  fullname: Engel, Jakob Julian
  organization: Meta Reality Labs
– sequence: 14
  givenname: Tomas
  surname: Hodan
  fullname: Hodan, Tomas
  organization: Meta Reality Labs
BookMark eNotkMtKw0AYRkdRsNa8QRfzAonzz5_MxZ20tREqEQnZlslcytQ2kSQivr0WXXyc3eHw3ZKrru88IQtgGQDT98vm9a3gEvOMM15kjAnBL0iipVaIUOQocnVJZsAEpkKDviHJOB4YY8gBhFYzsimrGlcPtDSdo-dV7cHbidaDse-x29PYUVzRMPQnut731nfTEC19-TxOMW2i_6JNdL4f78h1MMfRJ_-ck_ppXS_LdFttnpeP2zRqnFIrjFMycC6dzJ23gRWqNa0SWgaheciLXHP4rQ68BUDXKmucbZ0AxZ0EhnOy-NNG7_3uY4gnM3zvzl9wIRB_AB6vTRM
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52734.2025.00662
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798331543648
EISSN 1063-6919
EndPage 7071
ExternalDocumentID 11092663
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i93t-c6ad87f227d74decf058bab8697f692f454921364f2b113db8cadcbd6182d7103
IEDL.DBID RIE
IngestDate Wed Aug 20 06:20:56 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-c6ad87f227d74decf058bab8697f692f454921364f2b113db8cadcbd6182d7103
PageCount 11
ParticipantIDs ieee_primary_11092663
PublicationCentury 2000
PublicationDate 2025-June-10
PublicationDateYYYYMMDD 2025-06-10
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-10
  day: 10
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.404749
Snippet We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (3.7M+ images) of...
SourceID ieee
SourceType Publisher
StartPage 7061
SubjectTerms 3d lifting of in-hand objects
6dof object pose
aria
Artificial intelligence
augmented reality
Benchmark testing
contextual ai
dataset
hand tracking
hand-object interaction
Hands
Headphones
in-hand object segmentation
Object tracking
Pose estimation
smart glasses
Synchronization
Three-dimensional displays
Training
Videos
virtual reality
Title HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
URI https://ieeexplore.ieee.org/document/11092663
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcBUHkW85YE1NIlT22Htg0xthaqqW-XHNcqSoD7g9_F1Q2FhYLBkWZYsXcs69vE95xLy6DE87svURYpbFWUq8z0I9nhSgXWpsdqEYhNiMpHLZT5rxOpBCwMAIfkMnrAb_vJtbXZIlfXQHdMDCmuRlhB8L9Y6ECrMP2V4Lht5nJ_ZGyxmr-gvhtRJitQJx5o4v4qoBAwZd_65-inp_qjx6OyAM2fkCKpz0mmuj7Q5nJsL8lJM52z4TAtVWYptqpFkoR6ODBLitKwoG1IUlNDRWx3SMktDgwQ3WpTwSRelhXrTJfPxaD4ooqZOQlTmbBsZrqwULk2FFZkF43zwtdKS58LxPHUZmrAljGcu1UnCrJZGWaMt908L6y8Y7JK0q7qCK0JNoq3uZxArtMlXud8vkZk8VkYAdyCuSRfjsnrfO2GsvkNy88f4LTnB0GNqVRLfkfZ2vYN7cmw-tuVm_RD27wsC75se
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwFLSgIMGpLEXs-MA1NIlTO-bahSBKW6Go6q3yinJJUBf4ffxMKFw4cIhkRZEivado7MmbGYRuHYaHnTS2gaBaBIlI3Mp4e7xUGG1jpaXyYRNsNEpnMz6pxepeC2OM8cNn5g6W_l--rtQaqLI2uGM6QCHbaAeis2q51oZSIe4wQ3laC-Tcs-3udPICDmNAnsRAnlBIxfkVo-JRZND85_sPUOtHj4cnG6Q5RFumPELNegOJ689zeYwesnFOevc4E6XGcI0l0CzYAZICShwXJSY9DJIS3H-t_GBmobAX4QbTwnzgaaFNtWyhfNDPu1lQJyUEBSerQFGhU2bjmGmWaKOsK78UMqWcWcpjm4ANW0RoYmMZRUTLVAmtpKbucKHdFoOcoEZZleYUYRVJLTuJCQUY5QvuOsYSxUOhmKHWsDPUgrrM3768MObfJTn_4_4N2svy5-F8-Dh6ukD70AYYtIrCS9RYLdbmCu2q91WxXFz7Xn4CSvyeZw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=HOT3D%3A+Hand+and+Object+Tracking+in+3D+from+Egocentric+Multi-View+Videos&rft.au=Banerjee%2C+Prithviraj&rft.au=Shkodrani%2C+Sindi&rft.au=Moulon%2C+Pierre&rft.au=Hampali%2C+Shreyas&rft.date=2025-06-10&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=7061&rft.epage=7071&rft_id=info:doi/10.1109%2FCVPR52734.2025.00662&rft.externalDocID=11092663