Scalable image coding with enhancement features for human and machine

The past decade has seen significant advancements in computer vision technologies, resulting in an increasing consumption of images and videos by both human and machine. Although machines are usually the primary consumers, there are many applications where human involvement is indispensable. In this...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Multimedia systems Ročník 30; číslo 2; s. 77
Hlavní autoři:	Wu, Ying, An, Ping, Yang, Chao, Huang, XinPeng
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2024 Springer Nature B.V
Témata:	Codec Coders Collaboration Computer Communication Networks Computer Graphics Computer Science Computer vision Consumption Cost control Cryptology Data Storage Representation Deep learning Design Entropy Image coding Image reconstruction Machine vision Methods Multilayers Multimedia Information Systems Neural networks Operating Systems Optimization Performance enhancement Regular Paper Semantics Vision systems Scalable coding Image compression Machine vision Enhancement features
ISSN:	0942-4962, 1432-1882
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The past decade has seen significant advancements in computer vision technologies, resulting in an increasing consumption of images and videos by both human and machine. Although machines are usually the primary consumers, there are many applications where human involvement is indispensable. In this paper, we propose a novel image coding technique that targets machines while ensuring compatibility with human consumption. The proposed codec generates two distinct bitstreams: the reconstruction feature bitstreams and the enhancement feature bitstreams. The former are designed to facilitate image reconstruction for human consumption and vision tasks for machine consumption, while the latter are optimized for high-quality vision tasks. To achieve this goal, we introduce the Mask Multilayer Fusion Encoder (MMFE), which integrates multi-scale visual prior masks into partial channel features of the encoder. Additionally, due to the significant distortion of features at low bitrates, we propose a Local Feature Fusion Module (LFFM) that aggregates semantic information from the reconstruction features to obtain enhancement features, so as to improve the performance of vision tasks. Our experimental results demonstrate that our scalable codec provides significant bitrate savings of 26–77 % on machine vision tasks compared to state-of-the-art image codecs, while maintaining comparable performance in terms of image reconstruction. Our proposed codec represents a significant advancement in the field of image coding, with the potential to improve both human and machine consumption of visual media.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0942-4962 1432-1882
DOI:	10.1007/s00530-024-01279-y