BoxeR: Box-Attention for 2D and 3D Transformers

In this paper, we propose a simple attention mechanism, we call Box-Attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transfo...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 4763 - 4772
Hlavní autori: Nguyen, Duy-Kien, Ju, Jihong, Booij, Olaf, Oswald, Martin R., Snoek, Cees G. M.
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.06.2022
Predmet:
ISSN:1063-6919
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract In this paper, we propose a simple attention mechanism, we call Box-Attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on an input feature map. The BoxeR computes attention weights on these boxes by considering its grid structure. Notably, BoxeR-2D naturally reasons about box information within its attention module, making it suitable for end-to-end instance detection and segmentation tasks. By learning invariance to rotation in the box-attention module, BoxeR-3D is capable of generating discriminative information from a bird's-eye view plane for 3D end-to-end object detection. Our experiments demonstrate that the proposed BoxeR-2D achieves state-of-the-art results on COCO detection and instance segmentation. Besides, BoxeR-3D improves over the end-to-end 3D object detection baseline and already obtains a compelling performance for the vehicle category of Waymo Open, without any class-specific optimization. Code is available at https://github.com/kienduynguyen/BoxeR.
AbstractList In this paper, we propose a simple attention mechanism, we call Box-Attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on an input feature map. The BoxeR computes attention weights on these boxes by considering its grid structure. Notably, BoxeR-2D naturally reasons about box information within its attention module, making it suitable for end-to-end instance detection and segmentation tasks. By learning invariance to rotation in the box-attention module, BoxeR-3D is capable of generating discriminative information from a bird's-eye view plane for 3D end-to-end object detection. Our experiments demonstrate that the proposed BoxeR-2D achieves state-of-the-art results on COCO detection and instance segmentation. Besides, BoxeR-3D improves over the end-to-end 3D object detection baseline and already obtains a compelling performance for the vehicle category of Waymo Open, without any class-specific optimization. Code is available at https://github.com/kienduynguyen/BoxeR.
Author Nguyen, Duy-Kien
Snoek, Cees G. M.
Oswald, Martin R.
Ju, Jihong
Booij, Olaf
Author_xml – sequence: 1
  givenname: Duy-Kien
  surname: Nguyen
  fullname: Nguyen, Duy-Kien
  email: d.k.nguyen@uva.nl
  organization: Atlas Lab-University of Amsterdam
– sequence: 2
  givenname: Jihong
  surname: Ju
  fullname: Ju, Jihong
  email: jihong.ju@tomtom.com
  organization: TomTom
– sequence: 3
  givenname: Olaf
  surname: Booij
  fullname: Booij, Olaf
  email: olaf.booij@tomtom.com
  organization: TomTom
– sequence: 4
  givenname: Martin R.
  surname: Oswald
  fullname: Oswald, Martin R.
  email: m.r.oswald@uva.nl
  organization: Atlas Lab-University of Amsterdam
– sequence: 5
  givenname: Cees G. M.
  surname: Snoek
  fullname: Snoek, Cees G. M.
  email: cgmsnoek@uva.nl
  organization: Atlas Lab-University of Amsterdam
BookMark eNotzN9KwzAUgPEoCm5zT6AXeYF25yRNmuPd7PwHA2VMb0fanELFpZL2Qt9-Bb364HfxzcVF7CMLcYuQIwKtqo-3nVHWuVyBUjlAUeozMUdrTWGpsPpczBCsziwhXYnlMHwCgFaIltxMrO77H97dySnZehw5jl0fZdsnqTbSxyD1Ru6Tj8NER07Dtbhs_dfAy_8uxPvjw756zravTy_Vept1CvSYNezBA2DjTDBQB2CqTai5NU1JFlVZhpq8QapLrydnZAAitirYhl2jF-Lm79sx8-E7dUeffg_kHKBW-gQQy0Uy
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52688.2022.00473
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665469463
9781665469463
EISSN 1063-6919
EndPage 4772
ExternalDocumentID 9880132
Genre orig-research
GrantInformation_xml – fundername: Netherlands Ministry of Economic Affairs
  funderid: 10.13039/501100003195
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-cea0a001c85d50bd0e9b5dbef5c7961277db9a519b7a3dbee1e0099e62d6ce8c3
IEDL.DBID RIE
ISICitedReferencesCount 19
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000867754205004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:15:10 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-cea0a001c85d50bd0e9b5dbef5c7961277db9a519b7a3dbee1e0099e62d6ce8c3
PageCount 10
ParticipantIDs ieee_primary_9880132
PublicationCentury 2000
PublicationDate 2022-June
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.4218366
Snippet In this paper, we propose a simple attention mechanism, we call Box-Attention. It enables spatial interaction between grid features, as sampled from boxes of...
SourceID ieee
SourceType Publisher
StartPage 4763
SubjectTerms categorization
Codes
Computer vision
grouping and shape analysis
Object detection
Pattern recognition
Recognition: detection
retrieval; Deep learning architectures and techniques; Segmentation
Task analysis
Three-dimensional displays
Transformers
Title BoxeR: Box-Attention for 2D and 3D Transformers
URI https://ieeexplore.ieee.org/document/9880132
WOSCitedRecordID wos000867754205004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED61FQNTgRbxlgdGTJM4jmM2aKkYUFVVpepW-XGRuqSoTRE_HzsJgYGFyZYl2_LZ1n1n330HcKsMZ4phQCOlBY2VSqjyhqu7V7HROrZZSaS9eBWTSbpcymkL7ppYGEQsnc_w3lfLv3y7MXv_VDaQ7rA566kNbSFEFavVvKcwZ8kkMq2j48JADoaL6cyTmXgHrsjTcsY-OfqvHCqlChl3_zf5EfR_YvHItNEyx9DC_AS6NXgk9dXc9WDwtPnE2QNxBX0sisqLkThISqIRUbklbETm3yjVYb4-vI2f58MXWmdDoOsoYAU1qAInxtCk3PJA2wCl5lZjxo2QDqcIYbVUDpBpoZhrxxA9_MMksonB1LBT6OSbHM-AuKEyy1FwbZ1y4qHC2DO1KYnM9crEOfT8-lfvFeHFql76xd_Nl3DoBVz5T11Bp9ju8RoOzEex3m1vyl36AuuIkv4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmugJFYxve_Doyu52u916U5BgREIIEm-kj9mEy2JgMf58290VPXjx1KZJ23TaZr5pZ74BuJaaUUnR90KpuBdJGXvSGa72XkVaqcikBZH2dMCHw-TtTYxqcLOJhUHEwvkMb121-Ms3C712T2VtYQ-btZ62YJtFURiU0VqbFxVqbZlYJFV8XOCLdmc6Gjs6E-fCFTpizsilR_-VRaVQIr3G_6bfh9ZPNB4ZbfTMAdQwO4RGBR9JdTlXTWg_LD5xfEds4d3neenHSCwoJWGXyMwQ2iWTb5xqUV8LXnuPk07fq_IhePPQp7mnUfpWkIFOmGG-Mj4KxYzClGkuLFLh3CghLSRTXFLbjgE6AIhxaGKNiaZHUM8WGR4DsUOlhiFnylj1xAKJkeNqkwKp7ZXyE2i69c_eS8qLWbX007-br2C3P3kZzAZPw-cz2HPCLr2pzqGeL9d4ATv6I5-vlpfFjn0Bw_qWRQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=BoxeR%3A+Box-Attention+for+2D+and+3D+Transformers&rft.au=Nguyen%2C+Duy-Kien&rft.au=Ju%2C+Jihong&rft.au=Booij%2C+Olaf&rft.au=Oswald%2C+Martin+R.&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=4763&rft.epage=4772&rft_id=info:doi/10.1109%2FCVPR52688.2022.00473&rft.externalDocID=9880132