Multidirection and Multiscale Pyramid in Transformer for Video-Based Pedestrian Retrieval
In video surveillance, pedestrian retrieval (also called person reidentification) is a critical task. This task aims to retrieve the pedestrian of interest from nonoverlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still su...
Saved in:
| Published in: | IEEE transactions on industrial informatics Vol. 18; no. 12; pp. 8776 - 8785 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1551-3203, 1941-0050 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In video surveillance, pedestrian retrieval (also called person reidentification) is a critical task. This task aims to retrieve the pedestrian of interest from nonoverlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This article proposes a multidirection and multiscale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this article proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multiscale feature representation, this article presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multidirection and multiscale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. Data is available on-line at https://git.openi.org.cn/zangxh/PiT.git . |
|---|---|
| AbstractList | In video surveillance, pedestrian retrieval (also called person reidentification) is a critical task. This task aims to retrieve the pedestrian of interest from nonoverlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This article proposes a multidirection and multiscale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this article proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multiscale feature representation, this article presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multidirection and multiscale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. Data is available on-line at https://git.openi.org.cn/zangxh/PiT.git . |
| Author | Li, Ge Gao, Wei Zang, Xianghao |
| Author_xml | – sequence: 1 givenname: Xianghao orcidid: 0000-0001-8421-7167 surname: Zang fullname: Zang, Xianghao email: zangxh@pku.edu.cn organization: School of Electronic and Computer Engineering, Peking University, Shenzhen, China – sequence: 2 givenname: Ge orcidid: 0000-0003-0140-0949 surname: Li fullname: Li, Ge email: geli@ece.pku.edu.cn organization: School of Electronic and Computer Engineering, Peking University, Shenzhen, China – sequence: 3 givenname: Wei orcidid: 0000-0001-7429-5495 surname: Gao fullname: Gao, Wei email: gaowei262@pku.edu.cn organization: School of Electronic and Computer Engineering, Peking University, Shenzhen, China |
| BookMark | eNp9kM1LAzEQxYNUsK3eBS8Bz1uTzc5mc9TiR6FikSp4WrLJLKRsd2uyFfrfm9riwYOnNwzvNx9vRAZt1yIhl5xNOGfqZjmbTVKWphPBgcs8PyFDrjKeMAZsEGsAnoiUiTMyCmHFmJBMqCH5eN42vbPOo-ld11LdWvrTCkY3SBc7r9fOUtfSpddtqDu_Rk-j0HdnsUvudEBLF2gx9N7plr5iVPzSzTk5rXUT8OKoY_L2cL-cPiXzl8fZ9HaeGAGyT5RRBaCVykgLWtQFU4ApZEIjigrAVtIwVMpClWe6YEaIwmRFlmoNRQVcjMn1Ye7Gd5_beEa56ra-jSvLVHIFACotois_uIzvQvBYl8b1ev9y77VrSs7KfYxljLHcx1geY4wg-wNuvFtrv_sPuTogDhF_7UryjAspvgGNnH-q |
| CODEN | ITIICH |
| CitedBy_id | crossref_primary_10_1007_s10851_023_01166_7 crossref_primary_10_3390_app122312503 crossref_primary_10_1007_s40436_025_00569_6 crossref_primary_10_1007_s40747_024_01474_4 crossref_primary_10_1007_s11042_023_15116_3 crossref_primary_10_1109_TII_2023_3266372 crossref_primary_10_3390_s24237536 crossref_primary_10_1007_s00034_024_02808_w crossref_primary_10_1016_j_aei_2023_102238 crossref_primary_10_1109_TII_2024_3359432 crossref_primary_10_3390_s23062938 crossref_primary_10_3390_a17080352 crossref_primary_10_1109_TII_2023_3298473 crossref_primary_10_1109_TII_2023_3240733 crossref_primary_10_1109_TCSVT_2024_3362369 crossref_primary_10_1007_s11263_025_02350_5 crossref_primary_10_1016_j_eswa_2025_128123 crossref_primary_10_1109_TII_2024_3367043 crossref_primary_10_1007_s00521_025_11218_1 crossref_primary_10_1109_TITS_2024_3351841 crossref_primary_10_31857_S0005231023050057 crossref_primary_10_1016_j_patcog_2025_111813 crossref_primary_10_1016_j_cviu_2024_104030 crossref_primary_10_1109_TMM_2023_3276167 crossref_primary_10_1109_TII_2024_3453919 crossref_primary_10_1016_j_imavis_2024_105400 crossref_primary_10_1109_TCSVT_2023_3340428 crossref_primary_10_1134_S0005117923050041 crossref_primary_10_1186_s13634_024_01139_x crossref_primary_10_1016_j_autcon_2024_105726 crossref_primary_10_1016_j_neucom_2024_128479 crossref_primary_10_1049_ipr2_12913 crossref_primary_10_1109_TII_2023_3348838 crossref_primary_10_1016_j_knosys_2025_113461 |
| Cites_doi | 10.1007/978-3-030-58595-2_24 10.1109/TII.2019.2946030 10.1016/j.imavis.2021.104330 10.1109/CVPR46437.2021.01313 10.1109/CVPR42600.2020.00335 10.1109/TII.2017.2767557 10.1109/TII.2014.2330976 10.1109/CVPR42600.2020.01042 10.1109/ICCV.2019.00065 10.1609/aaai.v34i07.6632 10.1609/aaai.v33i01.33018295 10.1007/978-3-030-58607-2_6 10.1109/CVPR.2019.00505 10.1109/CVPR46437.2021.00205 10.1109/TPAMI.2015.2389824 10.1109/ICCV.2019.00406 10.1609/aaai.v34i07.6770 10.1609/aaai.v34i07.6807 10.1609/aaai.v34i07.6802 10.24963/ijcai.2020/141 10.1007/978-3-030-58598-3_39 10.1609/aaai.v35i2.16262 10.1049/ipr2.12380 10.1145/3394171.3413843 10.1109/CVPR.2019.00735 10.1109/CVPR46437.2021.00435 10.1007/978-3-030-58536-5_14 10.1109/CVPR42600.2020.00297 10.1109/TNNLS.2019.2920905 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TII.2022.3151766 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1941-0050 |
| EndPage | 8785 |
| ExternalDocumentID | 10_1109_TII_2022_3151766 9714137 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Key R&D Program of China grantid: 2020AAA0103501 |
| GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c357t-9c985ed79c7d5a3f8095e2543aee3b55db7c0e99d5b64a80c338c4842aa58b513 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 43 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000862429800042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1551-3203 |
| IngestDate | Mon Jun 30 10:07:08 EDT 2025 Tue Nov 18 22:35:39 EST 2025 Sat Nov 29 04:17:01 EST 2025 Wed Aug 27 02:14:19 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c357t-9c985ed79c7d5a3f8095e2543aee3b55db7c0e99d5b64a80c338c4842aa58b513 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-8421-7167 0000-0001-7429-5495 0000-0003-0140-0949 |
| PQID | 2719555928 |
| PQPubID | 85507 |
| PageCount | 10 |
| ParticipantIDs | crossref_citationtrail_10_1109_TII_2022_3151766 crossref_primary_10_1109_TII_2022_3151766 proquest_journals_2719555928 ieee_primary_9714137 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-12-01 |
| PublicationDateYYYYMMDD | 2022-12-01 |
| PublicationDate_xml | – month: 12 year: 2022 text: 2022-12-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE transactions on industrial informatics |
| PublicationTitleAbbrev | TII |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref13 ref34 ref12 ref14 ref31 ref30 ref33 ref11 ref32 ref10 ref2 ref1 ref17 ref19 ref18 wang (ref9) 0 sun (ref5) 0 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 liu (ref16) 2021 ref4 zheng (ref8) 0 ye (ref15) 2021 ref3 ref6 dosovitskiy (ref7) 0 |
| References_xml | – ident: ref31 doi: 10.1007/978-3-030-58595-2_24 – ident: ref1 doi: 10.1109/TII.2019.2946030 – ident: ref3 doi: 10.1016/j.imavis.2021.104330 – ident: ref12 doi: 10.1109/CVPR46437.2021.01313 – ident: ref29 doi: 10.1109/CVPR42600.2020.00335 – ident: ref4 doi: 10.1109/TII.2017.2767557 – ident: ref2 doi: 10.1109/TII.2014.2330976 – start-page: 868 year: 0 ident: ref8 article-title: MARS: A video benchmark for large-scale person re-identification publication-title: Proc Eur Conf Comput Vis – ident: ref27 doi: 10.1109/CVPR42600.2020.01042 – start-page: 480 year: 0 ident: ref5 article-title: Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline) publication-title: Proc Eur Conf Comput Vis – ident: ref20 doi: 10.1109/ICCV.2019.00065 – ident: ref22 doi: 10.1609/aaai.v34i07.6632 – ident: ref6 doi: 10.1609/aaai.v33i01.33018295 – ident: ref35 doi: 10.1007/978-3-030-58607-2_6 – ident: ref18 doi: 10.1109/CVPR.2019.00505 – ident: ref13 doi: 10.1109/CVPR46437.2021.00205 – ident: ref17 doi: 10.1109/TPAMI.2015.2389824 – ident: ref19 doi: 10.1109/ICCV.2019.00406 – ident: ref21 doi: 10.1609/aaai.v34i07.6770 – ident: ref23 doi: 10.1609/aaai.v34i07.6807 – ident: ref24 doi: 10.1609/aaai.v34i07.6802 – ident: ref25 doi: 10.24963/ijcai.2020/141 – start-page: 688 year: 0 ident: ref9 article-title: Person re-identification by video ranking publication-title: Proc Eur Conf Comput Vis – ident: ref33 doi: 10.1007/978-3-030-58598-3_39 – year: 2021 ident: ref15 article-title: Deep learning for person re-identification: A survey and outlook publication-title: IEEE Trans Pattern Anal Mach Intell – ident: ref34 doi: 10.1609/aaai.v35i2.16262 – ident: ref11 doi: 10.1049/ipr2.12380 – year: 0 ident: ref7 article-title: An image is worth 16 × 16 words: Transformers for image recognition at scale – ident: ref26 doi: 10.1145/3394171.3413843 – ident: ref30 doi: 10.1109/CVPR.2019.00735 – ident: ref14 doi: 10.1109/CVPR46437.2021.00435 – year: 2021 ident: ref16 article-title: Video swin transformer – ident: ref32 doi: 10.1007/978-3-030-58536-5_14 – ident: ref28 doi: 10.1109/CVPR42600.2020.00297 – ident: ref10 doi: 10.1109/TNNLS.2019.2920905 |
| SSID | ssj0037039 |
| Score | 2.571372 |
| Snippet | In video surveillance, pedestrian retrieval (also called person reidentification) is a critical task. This task aims to retrieve the pedestrian of interest... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 8776 |
| SubjectTerms | Ablation Cameras Convolution Feature extraction Kernel Multidirection and multiscale pyramid Natural language processing Patches (structures) Pyramids Representations Retrieval Task analysis Transformers video-based pedestrian retrieval vision transformer |
| Title | Multidirection and Multiscale Pyramid in Transformer for Video-Based Pedestrian Retrieval |
| URI | https://ieeexplore.ieee.org/document/9714137 https://www.proquest.com/docview/2719555928 |
| Volume | 18 |
| WOSCitedRecordID | wos000862429800042&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0050 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0037039 issn: 1551-3203 databaseCode: RIE dateStart: 20050101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB5W8aAH3-L6IgcvgnHbpGmao4qLXmSRVfRU0mQWFrQr3VXw35uk7SoogqeWkpQy3zQzk8zMB3CcFtKIUZFRXjBJE8YULTDxSY6II9-_TRodyCbk7W32-KgGHTid18IgYkg-wzN_G87y7cS8-a2ynpKxW3PlAixImda1Wu2qy53mqtAbVcSUs4i3R5KR6g1vblwgyJiLT0XTD_HLBAVOlR8LcbAu_bX_fdc6rDZeJDmvYd-ADpabsPKtt-AWPIXS2tpiOdkTXVoSHk0dKkgGH5V-GVsyLsmw9V2xIu5CHsYWJ_TCmTdLBmgxMHuU5C5wbznF3Ib7_tXw8po2PArUcCFnVBmVCbRSGWmF5qPMuVXoi-A1Ii-EsA6vCJWyokgTnUXGha0myRKmtcgKEfMdWCwnJe4CMc7LTZlOjPGk1QXXwsRGapmOZIxxil3otaLNTdNk3HNdPOch2IhU7sDIPRh5A0YXTuYzXusGG3-M3fLCn49r5N6Fgxa9vPkDpzmTsRIuXGLZ3u-z9mHZv7tOTTmAxVn1hoewZN4dENVRUK5P4RTMqw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED90CuqD3-L8zIMvgnVt0jTNo4riUMeQKfpU0uQGA-1kOsH_3iRtp6AIPrWUhJb7XXN3yd39AA6SXGjez9OA5VQEMaUyyDF2SY6Ifde_TWjlySZEp5M-PMjuFBxNamEQ0Sef4bG79Wf5ZqjHbqusJUVk11wxDTM8jmlYVmvV6y6zuit9d1QeBYyGrD6UDGWr127bUJBSG6HyqiPilxHyrCo_lmJvXy6W_vdly7BY-ZHkpAR-BaawWIWFb90F1-DRF9eWNstKn6jCEP_o1eKCpPsxUs8DQwYF6dXeK46IvZD7gcFhcGoNnCFdNOi5PQpy69m3rGquw93Fee_sMqiYFALNuHgLpJYpRyOkFoYr1k-tY4WuDF4hspxzYxELUUrD8yRWaaht4KrjNKZK8TTnEduARjEscBOItn5uQlWstaOtzpniOtJCiaQvIowSbEKrFm2mqzbjju3iKfPhRigzC0bmwMgqMJpwOJnxUrbY-GPsmhP-ZFwl9ybs1Ohl1T_4mlERSW4DJppu_T5rH-YuezfX2XW7c7UN8-49ZaLKDjTeRmPchVn9bkEZ7XlF-wSv7M_y |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multidirection+and+Multiscale+Pyramid+in+Transformer+for+Video-Based+Pedestrian+Retrieval&rft.jtitle=IEEE+transactions+on+industrial+informatics&rft.au=Zang%2C+Xianghao&rft.au=Li%2C+Ge&rft.au=Gao%2C+Wei&rft.date=2022-12-01&rft.issn=1551-3203&rft.eissn=1941-0050&rft.volume=18&rft.issue=12&rft.spage=8776&rft.epage=8785&rft_id=info:doi/10.1109%2FTII.2022.3151766&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TII_2022_3151766 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1551-3203&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1551-3203&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1551-3203&client=summon |