Collaborative Content-Dependent Modeling: A Return to the Roots of Salient Object Detection

Salient object detection (SOD) aims to identify the most visually distinctive object(s) from each given image. Most recent progresses focus on either adding elaborative connections among different convolution blocks or introducing boundary-aware supervision to help achieve better segmentation, which...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing Jg. 32; S. 4237 - 4246
Hauptverfasser:	Jiao, Siyu, Goel, Vidit, Navasardyan, Shant, Yang, Zongxin, Khachatryan, Levon, Yang, Yi, Wei, Yunchao, Zhao, Yao, Shi, Humphrey
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	United States IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Coders Collaboration content-dependent modeling Context Convolution Decoding Encoders-Decoders Feature extraction Head Image segmentation Modules Object detection Object recognition Roots Salience Salient object detection Task analysis Transformers
ISSN:	1057-7149, 1941-0042, 1941-0042
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Salient object detection (SOD) aims to identify the most visually distinctive object(s) from each given image. Most recent progresses focus on either adding elaborative connections among different convolution blocks or introducing boundary-aware supervision to help achieve better segmentation, which is actually moving away from the essence of SOD, i.e., distinctiveness/salience. This paper goes back to the roots of SOD and investigates the principles of how to identify distinctive object(s) in a more effective and efficient way. Intuitively, the salience of one object should largely depend on its global context within the input image. Based on this, we devise a clean yet effective architecture for SOD, named Collaborative Content-Dependent Networks (CCD-Net). In detail, we propose a collaborative content-dependent head whose parameters are conditioned on the input image's global context information. Within the content-dependent head, a hand-crafted multi-scale (HMS) module and a self-induced (SI) module are carefully designed to collaboratively generate content-aware convolution kernels for prediction. Benefited from the content-dependent head, CCD-Net is capable of leveraging global context to detect distinctive object(s) while keeping a simple encoder-decoder design. Extensive experimental results demonstrate that our CCD-Net achieves state-of-the-art results on various benchmarks. Our architecture is simple and intuitive compared to previous solutions, resulting in competitive characteristics with respect to model complexity, operating efficiency, and segmentation accuracy.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2023.3293759