Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective

With the transformative impact of the Transformer, DETR pioneered the application of the encoder-decoder ar-chitecture to object detection. A collection of follow-up research, e.g., Deformable DETR, aims to enhance DETR while adhering to the encoder-decoder design. In this work, we revisit the DETR...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 17416 - 17426
Main Authors: Zhao, Jinjing, Wei, Fangyun, Xu, Chang
Format: Conference Proceeding
Language:English
Published: IEEE 16.06.2024
Subjects:
ISSN:1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the transformative impact of the Transformer, DETR pioneered the application of the encoder-decoder ar-chitecture to object detection. A collection of follow-up research, e.g., Deformable DETR, aims to enhance DETR while adhering to the encoder-decoder design. In this work, we revisit the DETR series through the lens of Faster R-CNN. We find that the DETR resonates with the underlying principles of Faster R-CNN's RPN-refiner design but benefits from end-to-end detection owing to the incorpo-ration of Hungarian matching. We systematically adapt the Faster R-CNN towards the Deformable DETR, by in-tegrating or repurposing each component of Deformable DETR, and note that Deformable DETR's improved per-formance over Faster R-CNN is attributed to the adoption of advanced modules such as a superior proposal refiner (e.g., deformable attention rather than RoI Align). When viewing the DETR through the RPN-refiner paradigm, we delve into various proposal refinement techniques such as deformable attention, cross attention, and dynamic convo-lution. These proposal refiners cooperate well with each other; thus, we synergistically combine them to estab-lish a Hybrid Proposal Refiner (HPR). Our HPR is ver-satile and can be incorporated into various DETR de-tectors. For instance, by integrating HPR to a strong DETR detector, we achieve an AP of 54.9 on the COCO benchmark, utilizing a ResNet-50 backbone and a 36-epoch training schedule. Code and models are available at https://github.com/ZhaoJingjing713IHPR.
ISSN:1063-6919
DOI:10.1109/CVPR52733.2024.01649