Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding

In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy d...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - International Conference on Parallel and Distributed Systems pp. 274 - 281
Main Authors: Wang, Lipeng, Luo, Qiong, Yan, Shengen
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2020
Subjects:
ISSN:2690-5965
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fast mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5× over existing GPU-assisted image decoding libraries on real-world DL tasks.
ISSN:2690-5965
DOI:10.1109/ICPADS51040.2020.00045