DeHi: A Decoupled Hierarchical Architecture for Unaligned Ground-to-Aerial Geo-Localization
Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views, especially when their relative orientation is unknown. In this paper, we focus on the challenging problem of unaligned G2A geo-localization, w...
Uloženo v:
| Vydáno v: | IEEE transactions on circuits and systems for video technology Ročník 34; číslo 3; s. 1927 - 1940 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1051-8215, 1558-2205 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views, especially when their relative orientation is unknown. In this paper, we focus on the challenging problem of unaligned G2A geo-localization, where the query ground-level image is not perfectly orientation-aligned with respect to reference aerial imagery. We cast this problem as a metric embedding task and propose a decoupled hierarchical (DeHi) architecture to progressively learn meaningful multi-grained features. Specifically, DeHi first leverages CNN to extract high-level semantic features, and then introduces a novel orthogonally factorized transformer model consisting of part-level and global transformer encoders to learn part-level and global feature descriptors sequentially. For the purpose of enhancing representation power, cross-level connections are introduced to enrich part-level and global descriptors by CNN features, and the pooled part-level descriptor is combined with the global descriptor to construct the final query representation. Furthermore, such a decoupled hierarchical architecture allows for incorporating multi-level deep supervision. We introduce two part-level losses combined with one cross-level loss to complement the widely used global retrieval loss. Extensive experiments on standard benchmark datasets show significant boosting in recall rates compared with the previous state-of-the-art. Remarkably, DeHi improves the recall rate @top-1 from 78.59% to 82.38% (+3.79%) and from 72.91% to 77.94% (+5.03%) on CVUSA and CVACT datasets, respectively, under random orientation misalignments. Besides, DeHi maintains competitive inference efficiency with less parameters compared to existing transformer-based methods. |
|---|---|
| AbstractList | Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views, especially when their relative orientation is unknown. In this paper, we focus on the challenging problem of unaligned G2A geo-localization, where the query ground-level image is not perfectly orientation-aligned with respect to reference aerial imagery. We cast this problem as a metric embedding task and propose a decoupled hierarchical (DeHi) architecture to progressively learn meaningful multi-grained features. Specifically, DeHi first leverages CNN to extract high-level semantic features, and then introduces a novel orthogonally factorized transformer model consisting of part-level and global transformer encoders to learn part-level and global feature descriptors sequentially. For the purpose of enhancing representation power, cross-level connections are introduced to enrich part-level and global descriptors by CNN features, and the pooled part-level descriptor is combined with the global descriptor to construct the final query representation. Furthermore, such a decoupled hierarchical architecture allows for incorporating multi-level deep supervision. We introduce two part-level losses combined with one cross-level loss to complement the widely used global retrieval loss. Extensive experiments on standard benchmark datasets show significant boosting in recall rates compared with the previous state-of-the-art. Remarkably, DeHi improves the recall rate @top-1 from 78.59% to 82.38% (+3.79%) and from 72.91% to 77.94% (+5.03%) on CVUSA and CVACT datasets, respectively, under random orientation misalignments. Besides, DeHi maintains competitive inference efficiency with less parameters compared to existing transformer-based methods. |
| Author | Sun, Changyin Li, Jiawen Wang, Teng |
| Author_xml | – sequence: 1 givenname: Teng orcidid: 0000-0002-1802-0435 surname: Wang fullname: Wang, Teng email: wangteng@seu.edu.cn organization: School of Automation, Southeast University, Nanjing, China – sequence: 2 givenname: Jiawen orcidid: 0000-0002-5917-385X surname: Li fullname: Li, Jiawen email: lijiawen@seu.edu.cn organization: School of Automation, Southeast University, Nanjing, China – sequence: 3 givenname: Changyin orcidid: 0000-0001-9269-334X surname: Sun fullname: Sun, Changyin email: cysun@seu.edu.cn organization: School of Automation, Southeast University, Nanjing, China |
| BookMark | eNp9kL1OwzAURi1UJNrCCyCGSMwpthPXNlvUQotUiYGWhcFynBtwFeLiOAM8PW7LgBiY7h2-c3_OCA1a1wJClwRPCMHyZj17el5PKKbZJKMyYyQ_QUPCmEgpxWwQe8xIKihhZ2jUdVuMSS5yPkQvc1ja26RI5mBcv2ugSpYWvPbmzRrdJMW-CWBC7yGpnU82rW7saxtzC-_6tkqDSwvwNmYX4NKVi5T90sG69hyd1rrp4OKnjtHm_m49W6arx8XDrFilhsppSA2fGikrLTWvWSbKUuRlTnEtALMMyrwkVVkDlwITY6guJS2BGCZornklMGRjdH2cu_Puo4cuqK3rfbyzU9GFiFIIxzFFjynjXdd5qNXO23ftPxXBai9RHSSqvUT1IzFC4g9kbDg8F7y2zf_o1RG1APBrF-GcyDz7BlZngi0 |
| CODEN | ITCTEM |
| CitedBy_id | crossref_primary_10_1109_TCSVT_2025_3533574 crossref_primary_10_1109_TGRS_2024_3517654 crossref_primary_10_1109_TCSVT_2024_3382717 crossref_primary_10_1109_TIV_2024_3411098 crossref_primary_10_1109_TGRS_2025_3588220 |
| Cites_doi | 10.1109/TCSVT.2021.3127149 10.1109/ICCV.2017.374 10.1109/ICCV.2019.00056 10.1109/TCSVT.2020.2988034 10.1109/TPAMI.2022.3181116 10.1109/ICCV48922.2021.00986 10.1609/aaai.v34i07.6875 10.1109/ICCV48922.2021.00061 10.1109/ICCV.2017.74 10.1007/978-3-319-46448-0_30 10.1109/WACV51458.2022.00051 10.1007/978-3-030-58595-2_16 10.1109/CVPR46437.2021.00504 10.1007/978-3-030-58565-5_43 10.1007/s00521-022-06990-3 10.1109/TCSVT.2021.3061265 10.1109/TPAMI.2018.2846566 10.1109/CVPR.2016.90 10.1109/ICCV.2019.00848 10.1631/FITEE.2100463 10.1109/ICCV.2015.451 10.1007/s11263-015-0816-y 10.1109/ICCV48922.2021.01156 10.1109/TPAMI.2021.3081597 10.1109/TGRS.2021.3121337 10.1109/TPAMI.2021.3127346 10.1109/CVPR52688.2022.00123 10.1109/CVPR.2017.440 10.1109/ICCV48922.2021.00009 10.1109/CVPR.2018.00758 10.1007/978-3-319-46466-4_15 10.1109/CVPRW.2015.7301385 10.1109/CVPR.2019.00577 10.1109/CVPR.2015.7299135 10.1109/CVPR.2017.106 10.1109/CVPR42600.2020.00412 10.1007/978-3-030-58452-8_27 10.1109/WACV56688.2023.00019 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TCSVT.2023.3293514 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2205 |
| EndPage | 1940 |
| ExternalDocumentID | 10_1109_TCSVT_2023_3293514 10177194 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62273093; 61921004 funderid: 10.13039/501100001809 – fundername: ZhiShan Scholar Program of Southeast University funderid: 10.13039/501100008081 |
| GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c296t-c76c99da9a7f538bb84b420f8e053eb4b1dbfe79801cc2ab92be1c5824a7d80e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 7 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001179365000010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1051-8215 |
| IngestDate | Tue Nov 25 11:11:08 EST 2025 Sat Nov 29 01:44:24 EST 2025 Tue Nov 18 21:32:52 EST 2025 Wed Aug 27 02:18:12 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c296t-c76c99da9a7f538bb84b420f8e053eb4b1dbfe79801cc2ab92be1c5824a7d80e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-5917-385X 0000-0002-1802-0435 0000-0001-9269-334X |
| PQID | 2938023170 |
| PQPubID | 85433 |
| PageCount | 14 |
| ParticipantIDs | crossref_primary_10_1109_TCSVT_2023_3293514 proquest_journals_2938023170 crossref_citationtrail_10_1109_TCSVT_2023_3293514 ieee_primary_10177194 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-03-01 |
| PublicationDateYYYYMMDD | 2024-03-01 |
| PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on circuits and systems for video technology |
| PublicationTitleAbbrev | TCSVT |
| PublicationYear | 2024 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 Krizhevsky (ref20) Sun (ref2) 2019 ref15 ref14 ref11 Yang (ref10) ref17 ref16 ref19 ref18 ref51 Simonyan (ref24) Shi (ref6) ref46 ref45 ref47 ref42 ref41 ref44 Loshchilov (ref49) 2017 Babenko (ref35) ref8 ref7 ref4 ref3 ref5 ref40 ref34 Wang (ref43) ref37 ref31 ref30 ref33 ref32 ref1 ref39 ref38 Zhou (ref22) Kalantidis (ref36) ref23 ref26 ref25 Touvron (ref48) ref21 ref28 ref27 Kwon (ref50) Dosovitskiy (ref9) ref29 |
| References_xml | – ident: ref14 doi: 10.1109/TCSVT.2021.3127149 – start-page: 1269 volume-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. ident: ref35 article-title: Aggregating deep convolutional features for image retrieval – ident: ref42 doi: 10.1109/ICCV.2017.374 – ident: ref5 doi: 10.1109/ICCV.2019.00056 – ident: ref47 doi: 10.1109/TCSVT.2020.2988034 – ident: ref34 doi: 10.1109/TPAMI.2022.3181116 – ident: ref12 doi: 10.1109/ICCV48922.2021.00986 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Represent. ident: ref24 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref26 doi: 10.1609/aaai.v34i07.6875 – start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref10 article-title: Cross-view geo-localization with layer-to-layer transformer – ident: ref32 doi: 10.1109/ICCV48922.2021.00061 – ident: ref51 doi: 10.1109/ICCV.2017.74 – ident: ref28 doi: 10.1007/978-3-319-46448-0_30 – ident: ref40 doi: 10.1109/WACV51458.2022.00051 – ident: ref39 doi: 10.1007/978-3-030-58595-2_16 – ident: ref16 doi: 10.1109/CVPR46437.2021.00504 – ident: ref44 doi: 10.1007/978-3-030-58565-5_43 – ident: ref13 doi: 10.1007/s00521-022-06990-3 – ident: ref7 doi: 10.1109/TCSVT.2021.3061265 – ident: ref38 doi: 10.1109/TPAMI.2018.2846566 – year: 2019 ident: ref2 article-title: GeoCapsNet: Aerial to ground view image geo-localization using capsule network publication-title: arXiv:1904.06281 – ident: ref25 doi: 10.1109/CVPR.2016.90 – ident: ref3 doi: 10.1109/ICCV.2019.00848 – ident: ref15 doi: 10.1631/FITEE.2100463 – ident: ref19 doi: 10.1109/ICCV.2015.451 – ident: ref21 doi: 10.1007/s11263-015-0816-y – year: 2017 ident: ref49 article-title: Decoupled weight decay regularization publication-title: arXiv:1711.05101 – ident: ref45 doi: 10.1109/ICCV48922.2021.01156 – start-page: 10090 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref6 article-title: Spatial-aware feature aggregation for cross-view image based geo-localization – ident: ref30 doi: 10.1109/TPAMI.2021.3081597 – ident: ref27 doi: 10.1109/TGRS.2021.3121337 – start-page: 1 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref48 article-title: Training data-efficient image transformers & distillation through attention – start-page: 11643 volume-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) ident: ref43 article-title: Local features and visual words emerge in activations – start-page: 1 volume-title: Proc. Eur. Conf. Comput. Vis. ident: ref36 article-title: Cross-dimensional weighting for aggregating deep convolutional features – ident: ref33 doi: 10.1109/TPAMI.2021.3127346 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Represent. ident: ref9 article-title: An image is worth 16✗16 words: Transformers for image recognition at scale – ident: ref11 doi: 10.1109/CVPR52688.2022.00123 – start-page: 1097 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref20 article-title: Image classification with deep convolutional neural networks – start-page: 487 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref22 article-title: Learning deep features for scene recognition using places database – ident: ref4 doi: 10.1109/CVPR.2017.440 – ident: ref31 doi: 10.1109/ICCV48922.2021.00009 – ident: ref1 doi: 10.1109/CVPR.2018.00758 – ident: ref37 doi: 10.1007/978-3-319-46466-4_15 – ident: ref18 doi: 10.1109/CVPRW.2015.7301385 – ident: ref17 doi: 10.1109/CVPR.2019.00577 – ident: ref23 doi: 10.1109/CVPR.2015.7299135 – ident: ref29 doi: 10.1109/CVPR.2017.106 – ident: ref8 doi: 10.1109/CVPR42600.2020.00412 – ident: ref41 doi: 10.1007/978-3-030-58452-8_27 – start-page: 5905 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref50 article-title: ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks – ident: ref46 doi: 10.1109/WACV56688.2023.00019 |
| SSID | ssj0014847 |
| Score | 2.471267 |
| Snippet | Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views,... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1927 |
| SubjectTerms | Computational modeling Convolutional neural networks Datasets decoupled hierarchical architecture factorized transformer model Feature extraction Hierarchical systems Localization Location awareness multi-level deep supervision Orientation Recall Representations Transformers Unaligned cross-view geo-localization |
| Title | DeHi: A Decoupled Hierarchical Architecture for Unaligned Ground-to-Aerial Geo-Localization |
| URI | https://ieeexplore.ieee.org/document/10177194 https://www.proquest.com/docview/2938023170 |
| Volume | 34 |
| WOSCitedRecordID | wos001179365000010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2205 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014847 issn: 1051-8215 databaseCode: RIE dateStart: 19910101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62eNCDz4r1RQ7eJHU3TTeJt2LVHkQEqwgeljwmUijdoq2_3yS7rRVR8LYsmWXJl8wjk5kPoVMrFDihKMlAZIRZ4_Wgk5S0_Uvt2tpRqSLZBL-7E8_P8r4qVo-1MAAQL59BKzzGXL4tzCwclZ2H5cN91F1DNc6zslhrkTJgIrKJeX8hJcIbsnmFTCLPB5cPT4NWIApvtb1566TsmxWKtCo_dHE0MNeb__y1LbRReZK4W0K_jVZgvIPWl_oL7qKXHvSHF7iLez7InE1GYHF_GCqOIwFKkP3KImDvveLH4Ja_es2Lw5nU2JJpQbpxjeIbKMhtMHxV4WYDPV5fDS77pGJTIIbKbEoMz4yUVknFnddyWgumGU2cCOQQoJlOrXbApTdZxlClJdWQmo6gTHErEmjvofq4GMM-wsL5qIUaMCm1jJqOB1UYlwpjlUutU02Uzmc3N1Wr8cB4McpjyJHIPCKSB0TyCpEmOlvITMpGG3-ObgQMlkaW099ER3MU82ozvudeJLa548nBL2KHaM1_nZV3y45Qffo2g2O0aj6mw_e3k7jOPgGAutFP |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1ZSwMxEB60CuqDt1jPPPgmqbtp2k18K14VaxGsIviw5JhIQVqxrb_fJN16IAq-LUuGXfIlc2Qy8wEcWKHQCcVoHUWdcmu8HnSS0ap_qV1VOyZVJJvI2m3x8CBvimL1WAuDiPHyGVbCY8zl274ZhaOyo7B8Mh91T8NMoM4qyrU-kgZcRD4x7zGkVHhTNqmRSeRR5-T2vlMJVOGVqjdwtZR_s0ORWOWHNo4m5nzpnz-3DIuFL0kaY_BXYAp7q7DwpcPgGjyeYrN7TBrk1IeZo5dntKTZDTXHkQIlyH7mEYj3X8ldcMyfvO4l4VSqZ-mwTxtxlZIL7NNWMH1F6eY63J2fdU6atOBToIbJ-pCarG6ktEqqzHk9p7XgmrPEiUAPgZrr1GqHmfRGyximtGQaU1MTjKvMigSrG1Dq9Xu4CUQ4H7cwgyZlljNT87AK41JhrHKpdaoM6WR2c1M0Gw-cF895DDoSmUdE8oBIXiBShsMPmZdxq40_R68HDL6MHE9_GXYmKObFdhzkXiQ2usuSrV_E9mGu2blu5a3L9tU2zPsv8fFNsx0oDV9HuAuz5m3YHbzuxTX3DpIo1Jg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DeHi%3A+A+Decoupled+Hierarchical+Architecture+for+Unaligned+Ground-to-Aerial+Geo-Localization&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Wang%2C+Teng&rft.au=Li%2C+Jiawen&rft.au=Sun%2C+Changyin&rft.date=2024-03-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=34&rft.issue=3&rft.spage=1927&rft.epage=1940&rft_id=info:doi/10.1109%2FTCSVT.2023.3293514&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2023_3293514 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon |