Masked Autoencoder for Self-Supervised Pre-Training on Lidar Point Clouds

Uloženo v:
Podrobná bibliografie
Název: Masked Autoencoder for Self-Supervised Pre-Training on Lidar Point Clouds
Autoři: Hess, Georg, 1996, Jaxing, Johan, Svensson, Elias, Hagerman Olzon, David, 1987, Petersson, Christoffer, 1979, Svensson, Lennart, 1976
Zdroj: IEEE Workshop on Applications of Computer Vision (WACV), Waikoloa, USA Proceedings - 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2023. :350-359
Témata: 3d object detection, Self-supervised, Object detection, Voxel-MAE, Deep learning, Masked autoencoding
Popis: Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40 of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/voxel-mae.
Popis souboru: electronic
Přístupová URL adresa: https://research.chalmers.se/publication/536005
https://research.chalmers.se/publication/535108
https://openaccess.thecvf.com/content/WACV2023W/Pretrain/papers/Hess_Masked_Autoencoder_for_Self-Supervised_Pre-Training_on_Lidar_Point_Clouds_WACVW_2023_paper.pdf
Databáze: SwePub
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://research.chalmers.se/publication/536005#
    Name: EDS - SwePub (s4221598)
    Category: fullText
    Text: View record in SwePub
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Hess%20G
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsswe
DbLabel: SwePub
An: edsswe.oai.research.chalmers.se.29af9b79.4aec.4bd5.bc15.c7e443653987
RelevancyScore: 956
AccessLevel: 6
PubType: Conference
PubTypeId: conference
PreciseRelevancyScore: 955.779541015625
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Masked Autoencoder for Self-Supervised Pre-Training on Lidar Point Clouds
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Hess%2C+Georg%22">Hess, Georg</searchLink>, 1996<br /><searchLink fieldCode="AR" term="%22Jaxing%2C+Johan%22">Jaxing, Johan</searchLink><br /><searchLink fieldCode="AR" term="%22Svensson%2C+Elias%22">Svensson, Elias</searchLink><br /><searchLink fieldCode="AR" term="%22Hagerman+Olzon%2C+David%22">Hagerman Olzon, David</searchLink>, 1987<br /><searchLink fieldCode="AR" term="%22Petersson%2C+Christoffer%22">Petersson, Christoffer</searchLink>, 1979<br /><searchLink fieldCode="AR" term="%22Svensson%2C+Lennart%22">Svensson, Lennart</searchLink>, 1976
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <i>IEEE Workshop on Applications of Computer Vision (WACV), Waikoloa, USA Proceedings - 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2023</i>. :350-359
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%223d+object+detection%22">3d object detection</searchLink><br /><searchLink fieldCode="DE" term="%22Self-supervised%22">Self-supervised</searchLink><br /><searchLink fieldCode="DE" term="%22Object+detection%22">Object detection</searchLink><br /><searchLink fieldCode="DE" term="%22Voxel-MAE%22">Voxel-MAE</searchLink><br /><searchLink fieldCode="DE" term="%22Deep+learning%22">Deep learning</searchLink><br /><searchLink fieldCode="DE" term="%22Masked+autoencoding%22">Masked autoencoding</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40 of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/voxel-mae.
– Name: Format
  Label: File Description
  Group: SrcInfo
  Data: electronic
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/536005" linkWindow="_blank">https://research.chalmers.se/publication/536005</link><br /><link linkTarget="URL" linkTerm="https://research.chalmers.se/publication/535108" linkWindow="_blank">https://research.chalmers.se/publication/535108</link><br /><link linkTarget="URL" linkTerm="https://openaccess.thecvf.com/content/WACV2023W/Pretrain/papers/Hess_Masked_Autoencoder_for_Self-Supervised_Pre-Training_on_Lidar_Point_Clouds_WACVW_2023_paper.pdf" linkWindow="_blank">https://openaccess.thecvf.com/content/WACV2023W/Pretrain/papers/Hess_Masked_Autoencoder_for_Self-Supervised_Pre-Training_on_Lidar_Point_Clouds_WACVW_2023_paper.pdf</link>
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsswe&AN=edsswe.oai.research.chalmers.se.29af9b79.4aec.4bd5.bc15.c7e443653987
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1109/WACVW58289.2023.00039
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 10
        StartPage: 350
    Subjects:
      – SubjectFull: 3d object detection
        Type: general
      – SubjectFull: Self-supervised
        Type: general
      – SubjectFull: Object detection
        Type: general
      – SubjectFull: Voxel-MAE
        Type: general
      – SubjectFull: Deep learning
        Type: general
      – SubjectFull: Masked autoencoding
        Type: general
    Titles:
      – TitleFull: Masked Autoencoder for Self-Supervised Pre-Training on Lidar Point Clouds
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Hess, Georg
      – PersonEntity:
          Name:
            NameFull: Jaxing, Johan
      – PersonEntity:
          Name:
            NameFull: Svensson, Elias
      – PersonEntity:
          Name:
            NameFull: Hagerman Olzon, David
      – PersonEntity:
          Name:
            NameFull: Petersson, Christoffer
      – PersonEntity:
          Name:
            NameFull: Svensson, Lennart
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2023
          Identifiers:
            – Type: issn-locals
              Value: SWEPUB_FREE
            – Type: issn-locals
              Value: CTH_SWEPUB
          Titles:
            – TitleFull: IEEE Workshop on Applications of Computer Vision (WACV), Waikoloa, USA Proceedings - 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2023
              Type: main
ResultId 1