Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding

In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy d...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings - International Conference on Parallel and Distributed Systems s. 274 - 281
Hlavní autoři:	Wang, Lipeng, Luo, Qiong, Yan, Shengen
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.12.2020
Témata:	Decoding Deep learning GPU Graphics processing units Hardware heterogeneous processing image decoding Iterative decoding parallel processing Task analysis Transform coding
ISSN:	2690-5965
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fast mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5× over existing GPU-assisted image decoding libraries on real-world DL tasks.
ISSN:	2690-5965
DOI:	10.1109/ICPADS51040.2020.00045