DeHi: A Decoupled Hierarchical Architecture for Unaligned Ground-to-Aerial Geo-Localization

Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views, especially when their relative orientation is unknown. In this paper, we focus on the challenging problem of unaligned G2A geo-localization, w...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on circuits and systems for video technology Ročník 34; číslo 3; s. 1927 - 1940
Hlavní autoři: Wang, Teng, Li, Jiawen, Sun, Changyin
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1051-8215, 1558-2205
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views, especially when their relative orientation is unknown. In this paper, we focus on the challenging problem of unaligned G2A geo-localization, where the query ground-level image is not perfectly orientation-aligned with respect to reference aerial imagery. We cast this problem as a metric embedding task and propose a decoupled hierarchical (DeHi) architecture to progressively learn meaningful multi-grained features. Specifically, DeHi first leverages CNN to extract high-level semantic features, and then introduces a novel orthogonally factorized transformer model consisting of part-level and global transformer encoders to learn part-level and global feature descriptors sequentially. For the purpose of enhancing representation power, cross-level connections are introduced to enrich part-level and global descriptors by CNN features, and the pooled part-level descriptor is combined with the global descriptor to construct the final query representation. Furthermore, such a decoupled hierarchical architecture allows for incorporating multi-level deep supervision. We introduce two part-level losses combined with one cross-level loss to complement the widely used global retrieval loss. Extensive experiments on standard benchmark datasets show significant boosting in recall rates compared with the previous state-of-the-art. Remarkably, DeHi improves the recall rate @top-1 from 78.59% to 82.38% (+3.79%) and from 72.91% to 77.94% (+5.03%) on CVUSA and CVACT datasets, respectively, under random orientation misalignments. Besides, DeHi maintains competitive inference efficiency with less parameters compared to existing transformer-based methods.
AbstractList Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views, especially when their relative orientation is unknown. In this paper, we focus on the challenging problem of unaligned G2A geo-localization, where the query ground-level image is not perfectly orientation-aligned with respect to reference aerial imagery. We cast this problem as a metric embedding task and propose a decoupled hierarchical (DeHi) architecture to progressively learn meaningful multi-grained features. Specifically, DeHi first leverages CNN to extract high-level semantic features, and then introduces a novel orthogonally factorized transformer model consisting of part-level and global transformer encoders to learn part-level and global feature descriptors sequentially. For the purpose of enhancing representation power, cross-level connections are introduced to enrich part-level and global descriptors by CNN features, and the pooled part-level descriptor is combined with the global descriptor to construct the final query representation. Furthermore, such a decoupled hierarchical architecture allows for incorporating multi-level deep supervision. We introduce two part-level losses combined with one cross-level loss to complement the widely used global retrieval loss. Extensive experiments on standard benchmark datasets show significant boosting in recall rates compared with the previous state-of-the-art. Remarkably, DeHi improves the recall rate @top-1 from 78.59% to 82.38% (+3.79%) and from 72.91% to 77.94% (+5.03%) on CVUSA and CVACT datasets, respectively, under random orientation misalignments. Besides, DeHi maintains competitive inference efficiency with less parameters compared to existing transformer-based methods.
Author Sun, Changyin
Li, Jiawen
Wang, Teng
Author_xml – sequence: 1
  givenname: Teng
  orcidid: 0000-0002-1802-0435
  surname: Wang
  fullname: Wang, Teng
  email: wangteng@seu.edu.cn
  organization: School of Automation, Southeast University, Nanjing, China
– sequence: 2
  givenname: Jiawen
  orcidid: 0000-0002-5917-385X
  surname: Li
  fullname: Li, Jiawen
  email: lijiawen@seu.edu.cn
  organization: School of Automation, Southeast University, Nanjing, China
– sequence: 3
  givenname: Changyin
  orcidid: 0000-0001-9269-334X
  surname: Sun
  fullname: Sun, Changyin
  email: cysun@seu.edu.cn
  organization: School of Automation, Southeast University, Nanjing, China
BookMark eNp9kL1OwzAURi1UJNrCCyCGSMwpthPXNlvUQotUiYGWhcFynBtwFeLiOAM8PW7LgBiY7h2-c3_OCA1a1wJClwRPCMHyZj17el5PKKbZJKMyYyQ_QUPCmEgpxWwQe8xIKihhZ2jUdVuMSS5yPkQvc1ja26RI5mBcv2ugSpYWvPbmzRrdJMW-CWBC7yGpnU82rW7saxtzC-_6tkqDSwvwNmYX4NKVi5T90sG69hyd1rrp4OKnjtHm_m49W6arx8XDrFilhsppSA2fGikrLTWvWSbKUuRlTnEtALMMyrwkVVkDlwITY6guJS2BGCZornklMGRjdH2cu_Puo4cuqK3rfbyzU9GFiFIIxzFFjynjXdd5qNXO23ftPxXBai9RHSSqvUT1IzFC4g9kbDg8F7y2zf_o1RG1APBrF-GcyDz7BlZngi0
CODEN ITCTEM
CitedBy_id crossref_primary_10_1109_TCSVT_2025_3533574
crossref_primary_10_1109_TGRS_2024_3517654
crossref_primary_10_1109_TCSVT_2024_3382717
crossref_primary_10_1109_TIV_2024_3411098
crossref_primary_10_1109_TGRS_2025_3588220
Cites_doi 10.1109/TCSVT.2021.3127149
10.1109/ICCV.2017.374
10.1109/ICCV.2019.00056
10.1109/TCSVT.2020.2988034
10.1109/TPAMI.2022.3181116
10.1109/ICCV48922.2021.00986
10.1609/aaai.v34i07.6875
10.1109/ICCV48922.2021.00061
10.1109/ICCV.2017.74
10.1007/978-3-319-46448-0_30
10.1109/WACV51458.2022.00051
10.1007/978-3-030-58595-2_16
10.1109/CVPR46437.2021.00504
10.1007/978-3-030-58565-5_43
10.1007/s00521-022-06990-3
10.1109/TCSVT.2021.3061265
10.1109/TPAMI.2018.2846566
10.1109/CVPR.2016.90
10.1109/ICCV.2019.00848
10.1631/FITEE.2100463
10.1109/ICCV.2015.451
10.1007/s11263-015-0816-y
10.1109/ICCV48922.2021.01156
10.1109/TPAMI.2021.3081597
10.1109/TGRS.2021.3121337
10.1109/TPAMI.2021.3127346
10.1109/CVPR52688.2022.00123
10.1109/CVPR.2017.440
10.1109/ICCV48922.2021.00009
10.1109/CVPR.2018.00758
10.1007/978-3-319-46466-4_15
10.1109/CVPRW.2015.7301385
10.1109/CVPR.2019.00577
10.1109/CVPR.2015.7299135
10.1109/CVPR.2017.106
10.1109/CVPR42600.2020.00412
10.1007/978-3-030-58452-8_27
10.1109/WACV56688.2023.00019
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2023.3293514
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 1940
ExternalDocumentID 10_1109_TCSVT_2023_3293514
10177194
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62273093; 61921004
  funderid: 10.13039/501100001809
– fundername: ZhiShan Scholar Program of Southeast University
  funderid: 10.13039/501100008081
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c296t-c76c99da9a7f538bb84b420f8e053eb4b1dbfe79801cc2ab92be1c5824a7d80e3
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001179365000010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1051-8215
IngestDate Tue Nov 25 11:11:08 EST 2025
Sat Nov 29 01:44:24 EST 2025
Tue Nov 18 21:32:52 EST 2025
Wed Aug 27 02:18:12 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c296t-c76c99da9a7f538bb84b420f8e053eb4b1dbfe79801cc2ab92be1c5824a7d80e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-5917-385X
0000-0002-1802-0435
0000-0001-9269-334X
PQID 2938023170
PQPubID 85433
PageCount 14
ParticipantIDs crossref_primary_10_1109_TCSVT_2023_3293514
proquest_journals_2938023170
crossref_citationtrail_10_1109_TCSVT_2023_3293514
ieee_primary_10177194
PublicationCentury 2000
PublicationDate 2024-03-01
PublicationDateYYYYMMDD 2024-03-01
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-03-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
Krizhevsky (ref20)
Sun (ref2) 2019
ref15
ref14
ref11
Yang (ref10)
ref17
ref16
ref19
ref18
ref51
Simonyan (ref24)
Shi (ref6)
ref46
ref45
ref47
ref42
ref41
ref44
Loshchilov (ref49) 2017
Babenko (ref35)
ref8
ref7
ref4
ref3
ref5
ref40
ref34
Wang (ref43)
ref37
ref31
ref30
ref33
ref32
ref1
ref39
ref38
Zhou (ref22)
Kalantidis (ref36)
ref23
ref26
ref25
Touvron (ref48)
ref21
ref28
ref27
Kwon (ref50)
Dosovitskiy (ref9)
ref29
References_xml – ident: ref14
  doi: 10.1109/TCSVT.2021.3127149
– start-page: 1269
  volume-title: Proc. IEEE/CVF Int. Conf. Comput. Vis.
  ident: ref35
  article-title: Aggregating deep convolutional features for image retrieval
– ident: ref42
  doi: 10.1109/ICCV.2017.374
– ident: ref5
  doi: 10.1109/ICCV.2019.00056
– ident: ref47
  doi: 10.1109/TCSVT.2020.2988034
– ident: ref34
  doi: 10.1109/TPAMI.2022.3181116
– ident: ref12
  doi: 10.1109/ICCV48922.2021.00986
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Represent.
  ident: ref24
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref26
  doi: 10.1609/aaai.v34i07.6875
– start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref10
  article-title: Cross-view geo-localization with layer-to-layer transformer
– ident: ref32
  doi: 10.1109/ICCV48922.2021.00061
– ident: ref51
  doi: 10.1109/ICCV.2017.74
– ident: ref28
  doi: 10.1007/978-3-319-46448-0_30
– ident: ref40
  doi: 10.1109/WACV51458.2022.00051
– ident: ref39
  doi: 10.1007/978-3-030-58595-2_16
– ident: ref16
  doi: 10.1109/CVPR46437.2021.00504
– ident: ref44
  doi: 10.1007/978-3-030-58565-5_43
– ident: ref13
  doi: 10.1007/s00521-022-06990-3
– ident: ref7
  doi: 10.1109/TCSVT.2021.3061265
– ident: ref38
  doi: 10.1109/TPAMI.2018.2846566
– year: 2019
  ident: ref2
  article-title: GeoCapsNet: Aerial to ground view image geo-localization using capsule network
  publication-title: arXiv:1904.06281
– ident: ref25
  doi: 10.1109/CVPR.2016.90
– ident: ref3
  doi: 10.1109/ICCV.2019.00848
– ident: ref15
  doi: 10.1631/FITEE.2100463
– ident: ref19
  doi: 10.1109/ICCV.2015.451
– ident: ref21
  doi: 10.1007/s11263-015-0816-y
– year: 2017
  ident: ref49
  article-title: Decoupled weight decay regularization
  publication-title: arXiv:1711.05101
– ident: ref45
  doi: 10.1109/ICCV48922.2021.01156
– start-page: 10090
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref6
  article-title: Spatial-aware feature aggregation for cross-view image based geo-localization
– ident: ref30
  doi: 10.1109/TPAMI.2021.3081597
– ident: ref27
  doi: 10.1109/TGRS.2021.3121337
– start-page: 1
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref48
  article-title: Training data-efficient image transformers & distillation through attention
– start-page: 11643
  volume-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
  ident: ref43
  article-title: Local features and visual words emerge in activations
– start-page: 1
  volume-title: Proc. Eur. Conf. Comput. Vis.
  ident: ref36
  article-title: Cross-dimensional weighting for aggregating deep convolutional features
– ident: ref33
  doi: 10.1109/TPAMI.2021.3127346
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Represent.
  ident: ref9
  article-title: An image is worth 16✗16 words: Transformers for image recognition at scale
– ident: ref11
  doi: 10.1109/CVPR52688.2022.00123
– start-page: 1097
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref20
  article-title: Image classification with deep convolutional neural networks
– start-page: 487
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref22
  article-title: Learning deep features for scene recognition using places database
– ident: ref4
  doi: 10.1109/CVPR.2017.440
– ident: ref31
  doi: 10.1109/ICCV48922.2021.00009
– ident: ref1
  doi: 10.1109/CVPR.2018.00758
– ident: ref37
  doi: 10.1007/978-3-319-46466-4_15
– ident: ref18
  doi: 10.1109/CVPRW.2015.7301385
– ident: ref17
  doi: 10.1109/CVPR.2019.00577
– ident: ref23
  doi: 10.1109/CVPR.2015.7299135
– ident: ref29
  doi: 10.1109/CVPR.2017.106
– ident: ref8
  doi: 10.1109/CVPR42600.2020.00412
– ident: ref41
  doi: 10.1007/978-3-030-58452-8_27
– start-page: 5905
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref50
  article-title: ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks
– ident: ref46
  doi: 10.1109/WACV56688.2023.00019
SSID ssj0014847
Score 2.471267
Snippet Ground-to-aerial (G2A) geo-localization remains extremely challenging due to the drastic appearance and geometry differences between ground and aerial views,...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1927
SubjectTerms Computational modeling
Convolutional neural networks
Datasets
decoupled hierarchical architecture
factorized transformer model
Feature extraction
Hierarchical systems
Localization
Location awareness
multi-level deep supervision
Orientation
Recall
Representations
Transformers
Unaligned cross-view geo-localization
Title DeHi: A Decoupled Hierarchical Architecture for Unaligned Ground-to-Aerial Geo-Localization
URI https://ieeexplore.ieee.org/document/10177194
https://www.proquest.com/docview/2938023170
Volume 34
WOSCitedRecordID wos001179365000010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2205
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014847
  issn: 1051-8215
  databaseCode: RIE
  dateStart: 19910101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62eNCDz4r1RQ7eJHU3TTeJt2LVHkQEqwgeljwmUijdoq2_3yS7rRVR8LYsmWXJl8wjk5kPoVMrFDihKMlAZIRZ4_Wgk5S0_Uvt2tpRqSLZBL-7E8_P8r4qVo-1MAAQL59BKzzGXL4tzCwclZ2H5cN91F1DNc6zslhrkTJgIrKJeX8hJcIbsnmFTCLPB5cPT4NWIApvtb1566TsmxWKtCo_dHE0MNeb__y1LbRReZK4W0K_jVZgvIPWl_oL7qKXHvSHF7iLez7InE1GYHF_GCqOIwFKkP3KImDvveLH4Ja_es2Lw5nU2JJpQbpxjeIbKMhtMHxV4WYDPV5fDS77pGJTIIbKbEoMz4yUVknFnddyWgumGU2cCOQQoJlOrXbApTdZxlClJdWQmo6gTHErEmjvofq4GMM-wsL5qIUaMCm1jJqOB1UYlwpjlUutU02Uzmc3N1Wr8cB4McpjyJHIPCKSB0TyCpEmOlvITMpGG3-ObgQMlkaW099ER3MU82ozvudeJLa548nBL2KHaM1_nZV3y45Qffo2g2O0aj6mw_e3k7jOPgGAutFP
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1ZSwMxEB60CuqDt1jPPPgmqbtp2k18K14VaxGsIviw5JhIQVqxrb_fJN16IAq-LUuGXfIlc2Qy8wEcWKHQCcVoHUWdcmu8HnSS0ap_qV1VOyZVJJvI2m3x8CBvimL1WAuDiPHyGVbCY8zl274ZhaOyo7B8Mh91T8NMoM4qyrU-kgZcRD4x7zGkVHhTNqmRSeRR5-T2vlMJVOGVqjdwtZR_s0ORWOWHNo4m5nzpnz-3DIuFL0kaY_BXYAp7q7DwpcPgGjyeYrN7TBrk1IeZo5dntKTZDTXHkQIlyH7mEYj3X8ldcMyfvO4l4VSqZ-mwTxtxlZIL7NNWMH1F6eY63J2fdU6atOBToIbJ-pCarG6ktEqqzHk9p7XgmrPEiUAPgZrr1GqHmfRGyximtGQaU1MTjKvMigSrG1Dq9Xu4CUQ4H7cwgyZlljNT87AK41JhrHKpdaoM6WR2c1M0Gw-cF895DDoSmUdE8oBIXiBShsMPmZdxq40_R68HDL6MHE9_GXYmKObFdhzkXiQ2usuSrV_E9mGu2blu5a3L9tU2zPsv8fFNsx0oDV9HuAuz5m3YHbzuxTX3DpIo1Jg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DeHi%3A+A+Decoupled+Hierarchical+Architecture+for+Unaligned+Ground-to-Aerial+Geo-Localization&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Wang%2C+Teng&rft.au=Li%2C+Jiawen&rft.au=Sun%2C+Changyin&rft.date=2024-03-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=34&rft.issue=3&rft.spage=1927&rft.epage=1940&rft_id=info:doi/10.1109%2FTCSVT.2023.3293514&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2023_3293514
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon