Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding

Machine visual intelligence has exploded in recent years. Large-scale, high-quality image and video datasets significantly empower learning-based machine vision models, especially deep-learning models. However, images and videos are usually compressed before being analyzed in practical situations wh...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	International journal of computer vision Ročník 129; číslo 10; s. 2889 - 2906
Hlavní autori:	Zhang, Qi, Wang, Shanshe, Zhang, Xinfeng, Ma, Siwei, Gao, Wen
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York Springer US 01.10.2021 Springer Springer Nature B.V
Predmet:	Analysis Artificial Intelligence Computer Imaging Computer Science Data compression Datasets Deep learning Distortion Image coding Image compression Image Processing and Computer Vision Image quality Machine vision Pattern Recognition Pattern Recognition and Graphics Special Issue on Deep Learning for Video Analysis and Compression Video compression Vision Vision systems Deep learning Just noticeable distortion Machine vision Image and video coding
ISSN:	0920-5691, 1573-1405
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Machine visual intelligence has exploded in recent years. Large-scale, high-quality image and video datasets significantly empower learning-based machine vision models, especially deep-learning models. However, images and videos are usually compressed before being analyzed in practical situations where transmission or storage is limited, leading to a noticeable performance loss of vision models. In this work, we broadly investigate the impact on the performance of machine vision from image and video coding. Based on the investigation, we propose Just Recognizable Distortion (JRD) to present the maximum distortion caused by data compression that will reduce the machine vision model performance to an unacceptable level. A large-scale JRD-annotated dataset containing over 340,000 images is built for various machine vision tasks, where the factors for different JRDs are studied. Furthermore, an ensemble-learning-based framework is established to predict the JRDs for diverse vision tasks under few- and non-reference conditions, which consists of multiple binary classifiers to improve the prediction accuracy. Experiments prove the effectiveness of the proposed JRD-guided image and video coding to significantly improve compression and machine vision performance. Applying predicted JRD is able to achieve remarkably better machine vision task accuracy and save a large number of bits.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0920-5691 1573-1405
DOI:	10.1007/s11263-021-01505-4