Multi-rate deep semantic image compression with quantized modulated autoencoder

Recently, deep learning has demonstrated impressive performance in image compression. Methods, that achieve and even outperform conventional codecs performances, are continually emerging. However, most of them need to train and deploy separate networks for rate adaptation. This is impractical and ex...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE International Workshop on Multimedia Signal Processing s. 1 - 6
Hlavní autor:	Sebai, Dorsaf
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 06.10.2021
Témata:	Computational modeling Conditional autoencoder Deep compression Image coding Memory management Quantization (signal) Quantized autoencoder Semantic analysis Semantics Training Variable-rate compression Visualization
ISSN:	2473-3628
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Recently, deep learning has demonstrated impressive performance in image compression. Methods, that achieve and even outperform conventional codecs performances, are continually emerging. However, most of them need to train and deploy separate networks for rate adaptation. This is impractical and extensive in terms of memory cost and power consumption, especially for broad bitrate ranges. Further, methods that consider the semantic-important structure of the image are extremely sparse. This leads to non-optimized bit allocation for the eye-catching foreground details, that have to be preserved for the almost all computer vision applications. Towards this end, we establish an end-to-end multi-rate deep semantic image compression with quantized conditional autoencoder. It includes two neural networks for the semantic analysis and image compression, respectively. The semantic analysis network extracts the essential semantic regions of the input image, and calculates the Semantic-Important Structural SIMilarity (SI-SSIM) index for each of them. The compression network is then trained to optimize a multi-loss function based on SI-SSIM and conditioned on the activation bitwidths. Performances of our model are evaluated on the JPEG AI dataset for objective and perceptual quality metrics. Obtained results show that our method yields higher performances over JPEG, JPEG 2000 and HEVC intra baselines and competitive performances with VVC intra.
ISSN:	2473-3628
DOI:	10.1109/MMSP53017.2021.9733550