UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
DEtection TRansformer (DETR) for object detection reaches competitive performance compared with Faster R-CNN via a transformer encoder-decoder architecture. However, trained with scratch transformers, DETR needs large-scale training data and an extreme long training schedule even on COCO dataset. In...
Saved in:
| Published in: | arXiv.org |
|---|---|
| Main Authors: | , , , |
| Format: | Paper |
| Language: | English |
| Published: |
Ithaca
Cornell University Library, arXiv.org
24.07.2023
|
| Subjects: | |
| ISSN: | 2331-8422 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | DEtection TRansformer (DETR) for object detection reaches competitive performance compared with Faster R-CNN via a transformer encoder-decoder architecture. However, trained with scratch transformers, DETR needs large-scale training data and an extreme long training schedule even on COCO dataset. Inspired by the great success of pre-training transformers in natural language processing, we propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR). Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the input image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade off classification and localization preferences in the pretext task, we find that freezing the CNN backbone is the prerequisite for the success of pre-training transformers. (2) To perform multi-query localization, we develop UP-DETR with multi-query patch detection with attention mask. Besides, UP-DETR also provides a unified perspective for fine-tuning object detection and one-shot detection tasks. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation. Code and pre-training models: https://github.com/dddzg/up-detr. |
|---|---|
| AbstractList | DEtection TRansformer (DETR) for object detection reaches competitive performance compared with Faster R-CNN via a transformer encoder-decoder architecture. However, trained with scratch transformers, DETR needs large-scale training data and an extreme long training schedule even on COCO dataset. Inspired by the great success of pre-training transformers in natural language processing, we propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR). Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the input image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade off classification and localization preferences in the pretext task, we find that freezing the CNN backbone is the prerequisite for the success of pre-training transformers. (2) To perform multi-query localization, we develop UP-DETR with multi-query patch detection with attention mask. Besides, UP-DETR also provides a unified perspective for fine-tuning object detection and one-shot detection tasks. In our experiments, UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation. Code and pre-training models: https://github.com/dddzg/up-detr. |
| Author | Chen, Junying Cai, Bolun Dai, Zhigang Lin, Yugeng |
| Author_xml | – sequence: 1 givenname: Zhigang surname: Dai fullname: Dai, Zhigang – sequence: 2 givenname: Bolun surname: Cai fullname: Cai, Bolun – sequence: 3 givenname: Yugeng surname: Lin fullname: Lin, Yugeng – sequence: 4 givenname: Junying surname: Chen fullname: Chen, Junying |
| BookMark | eNotjctKw0AUQAdRsNZ-gLsB16l37jwy40760EKhRdJ1mSQ3mqCTOpNWP9-Crs7iwDk37DL0gRi7EzBVVmt48PGnPU0RhJiCA6cu2AilFJlViNdsklIHAGhy1FqO2Gq3zeaL4vWR70I6Hiie2kQ130bKhujb0IY33vSRb8qOqoHPaTij7QP_bod3XkQf0ll_Uky37KrxH4km_xyzYrkoZi_ZevO8mj2tM68RMoteKFP7StoS6oa0qbDJfZXrxiljqDKgnS6htLI23loU5GqhSrJNpVHXcszu_7KH2H8dKQ37rj_GcD7uURlEpx2A_AUNO0-Z |
| ContentType | Paper |
| Copyright | 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.48550/arxiv.2011.09094 |
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection ProQuest Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology collection ProQuest One ProQuest Central SciTech Premium ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering collection |
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2331-8422 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-LOGICAL-a520-82a146dac38b0dfe56c2f7ac75f9466ec60595b0b83d6a8821e9d14be8fc525d3 |
| IEDL.DBID | M7S |
| IngestDate | Mon Jun 30 09:26:10 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a520-82a146dac38b0dfe56c2f7ac75f9466ec60595b0b83d6a8821e9d14be8fc525d3 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
| OpenAccessLink | https://www.proquest.com/docview/2462295900?pq-origsite=%requestingapplication% |
| PQID | 2462295900 |
| PQPubID | 2050157 |
| ParticipantIDs | proquest_journals_2462295900 |
| PublicationCentury | 2000 |
| PublicationDate | 20230724 |
| PublicationDateYYYYMMDD | 2023-07-24 |
| PublicationDate_xml | – month: 07 year: 2023 text: 20230724 day: 24 |
| PublicationDecade | 2020 |
| PublicationPlace | Ithaca |
| PublicationPlace_xml | – name: Ithaca |
| PublicationTitle | arXiv.org |
| PublicationYear | 2023 |
| Publisher | Cornell University Library, arXiv.org |
| Publisher_xml | – name: Cornell University Library, arXiv.org |
| SSID | ssj0002672553 |
| Score | 1.8390161 |
| SecondaryResourceType | preprint |
| Snippet | DEtection TRansformer (DETR) for object detection reaches competitive performance compared with Faster R-CNN via a transformer encoder-decoder architecture.... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Coders Encoders-Decoders Learning Natural language processing Object recognition Patches (structures) Queries Training Transformers |
| Title | UP-DETR: Unsupervised Pre-training for Object Detection with Transformers |
| URI | https://www.proquest.com/docview/2462295900 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwGLSgBYmJt3iUygOr1cRN4pgFCdqKSlCikqIyVX5F6pKWpK34-fgzCUgMLIyRl8iP8_n8-Q6ha06NttReEXu68ohl4AERikrCONeKgyOZE3NeH9loFE-nPKkEt7Iqq6wx0QG1XijQyDs0iCB5mnve7fKdQGoU3K5WERrbqAkuCb4r3Xv51lhoxCxj7n5dZjrrro4oPuabyrmTexBV_AuC3b4y2P_vHx2gZiKWpjhEWyY_QruunlOVx2g4SUivn45v8CQv10sAhNJonBSG1KEQ2NJV_CxBh8E9s3IlWTkGXRanNZu13PAEpYN-ev9AqtQEIkIKz8KFBT8tVDeWns5MGCmaMaFYmIGVvFH2_MJD6cm4qyNh-bVvuPYDaeJMhTTU3VPUyBe5OUMY3KV8KpikXhxIZofbLncWK24iHoeZOUetumNm1cwvZz-9cvF38yXag-h20Elp0EKNVbE2V2hHbVbzsmij5l1_lIzbbkDtVzJ8St4-AR8Uqp4 |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3LTsJAFL1B0OjKd3ygzkKXE8r0MR0T40IgEBAbLcYdmc5MEzYFW0D9KP_RmUI1ceGOhesmTdN7e3rmzJ1zAC4ZUVJTe4H16srCmoE7mAsSYcqYFMw4kuViznOP9vv-ywsLSvBZnIUxY5UFJuZALcfCaOQ14ngmeZpZ1u3kFZvUKLO7WkRoLNqiqz7e9JItu-k0dH2vCGk1w7s2XqYKYO4Sc2yaa3CQXNh-ZMlYuZ4gMeWCurGxWldC83vmRlbk29Ljmn_WFZN1J1J-LFziSlvfdg0qmkUQlk8KPn1LOsSjmqDbi73T3CmsxtP30XxpFMosk4z8C_Hz31hr-5-9gB2oBHyi0l0oqWQPNvJpVZHtQ2cQ4EYzfLxGgySbTQzcZUqiIFW4iLxAmoyjh8ioTKihpvnAWYKM6ozCgqtr5nsA4Soe_hDKyThRR4CMd1adcBoRy3ciqptZgxn1BVMe891YHUO1qMNw-V1nw58inPx9-QI22-F9b9jr9LunsGVC6o0iTJwqlKfpTJ3BuphPR1l6nvcQguGKS_YFcCsD2g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=UP-DETR%3A+Unsupervised+Pre-training+for+Object+Detection+with+Transformers&rft.jtitle=arXiv.org&rft.au=Dai%2C+Zhigang&rft.au=Cai%2C+Bolun&rft.au=Lin%2C+Yugeng&rft.au=Chen%2C+Junying&rft.date=2023-07-24&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2011.09094 |