DLI-Net: Dual Local Interaction Network for Fine-Grained Sketch-Based Image Retrieval

Fine-grained sketch-based image retrieval (FG-SBIR) is considered an ideal method of image retrieval due to the rich and easily accessible characteristics of sketches. It aims to find the most similar photo from the photo gallery based on the input sketch. Most previous works follow the paradigm tha...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology Vol. 32; no. 10; pp. 7177 - 7189
Main Authors: Sun, Haifeng, Xu, Jiaqing, Wang, Jingyu, Qi, Qi, Ge, Ce, Liao, Jianxin
Format: Journal Article
Language:English
Published: New York IEEE 01.10.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1051-8215, 1558-2205
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fine-grained sketch-based image retrieval (FG-SBIR) is considered an ideal method of image retrieval due to the rich and easily accessible characteristics of sketches. It aims to find the most similar photo from the photo gallery based on the input sketch. Most previous works follow the paradigm that extracting global feature first and then projecting the features of sketch and photo to unified embedding feature space using triplet loss. However, the global feature is not appropriate for extracting the crucial fine-grained information. Based on this principle, we propose a Dual Local Interaction Network (DLI-Net). DLI-Net explores an effective and efficient way to utilize local features for FG-SBIR. Specifically, we first propose a Local Feature Extractor to extract mid-level local features. Then, in response to the problems brought by local features, we propose a Dual Interaction Module, which contains Self Interaction Module and Cross Interaction Module. Self Interaction Module speeds up retrieval by eliminating the redundant local features of background. Cross Interaction Module solves the spatial misalignment by making the sketches interact with photos. Extensive experiments on six commonly used datasets show that our DLI-Net outperforms state-of-the-art competitors by a significant margin with a reasonable retrieval speed. Moreover, to the best of our knowledge, DLI-Net is the first model that beats humans on all six datasets. Besides, DLI-Net also performs best on cross-category fine-grained sketch-based image retrieval task, which further demonstrates local features are more appropriate for FG-SBIR.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2022.3171972