Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding

In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - International Conference on Parallel and Distributed Systems S. 274 - 281
Hauptverfasser: Wang, Lipeng, Luo, Qiong, Yan, Shengen
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.12.2020
Schlagworte:
ISSN:2690-5965
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fast mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5× over existing GPU-assisted image decoding libraries on real-world DL tasks.
AbstractList In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fast mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5× over existing GPU-assisted image decoding libraries on real-world DL tasks.
Author Luo, Qiong
Yan, Shengen
Wang, Lipeng
Author_xml – sequence: 1
  givenname: Lipeng
  surname: Wang
  fullname: Wang, Lipeng
  email: lwangay@cse.ust.hk
  organization: HKUST,Department of Computer Science and Engineering,Hong Kong, China
– sequence: 2
  givenname: Qiong
  surname: Luo
  fullname: Luo, Qiong
  email: luo@cse.ust.hk
  organization: HKUST,Department of Computer Science and Engineering,Hong Kong, China
– sequence: 3
  givenname: Shengen
  surname: Yan
  fullname: Yan, Shengen
  email: yanshengen@sensetime.com
  organization: SenseTime Research,Shenzhen,China
BookMark eNotjd1Kw0AUhFdRsKk-gSB5gcRz9ie7uQyprcFAC7bXZZOc1NU2DdmA6NMb0avhg_lmAnbVnTti7AEhRoT0scg32eJVIUiIOXCIAUCqCxag5gZT0DK5ZDOepBCpNFE3LPD-HaaiUDBjL1ld05EGO7ruEC6I-rAkO3S_tLX-w4efbnwL1_3oTu6bmnC12UXWe-fHCYqTPdBk1edmEm7ZdWuPnu7-c852y6dt_hyV61WRZ2XkptMxwlaQ5tI0CKbmRlpqsWqTtjIaeSUFkCSrNEINjTYJCa5a2aCoiGpeVVrM2f3friOifT-4kx2-9qlQKQcUP69rT00
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICPADS51040.2020.00045
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728190746
9781728190747
EISSN 2690-5965
EndPage 281
ExternalDocumentID 9359201
Genre orig-research
GroupedDBID 23M
29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i203t-1f3e7248d108c284aef1bf6fb8712b430e4ea5710c0d786e325f4d13beec2bb73
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000662964400033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:41:17 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-1f3e7248d108c284aef1bf6fb8712b430e4ea5710c0d786e325f4d13beec2bb73
PageCount 8
ParticipantIDs ieee_primary_9359201
PublicationCentury 2000
PublicationDate 2020-Dec.
PublicationDateYYYYMMDD 2020-12-01
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-Dec.
PublicationDecade 2020
PublicationTitle Proceedings - International Conference on Parallel and Distributed Systems
PublicationTitleAbbrev ICPADS
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020350
Score 2.1485918
Snippet In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL...
SourceID ieee
SourceType Publisher
StartPage 274
SubjectTerms Decoding
Deep learning
GPU
Graphics processing units
Hardware
heterogeneous processing
image decoding
Iterative decoding
parallel processing
Task analysis
Transform coding
Title Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding
URI https://ieeexplore.ieee.org/document/9359201
WOSCitedRecordID wos000662964400033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePBUtRXf5ODRtZvHvo6ltVqEumALvZUkOylFui19ePDXO9ldK4IXbyEQQmaSTL6ZfDOE3IkkNpyB8rjhypPaBp5iLotoLJlNmLAmLotNRMNhPJkkaY3c77kwAFB8PoMH1yxi-dnS7JyrrO1YpNyRtQ6iKCy5Wntw5SJkFQOY-Ul70E07vTfccNJHEMj9Iitn8KuESmFB-o3_zX1MWj9UPJrujcwJqUF-ShrftRhodTSb5KVjDFoQp898RnsAK1qlTp3Rkdq8b6jzuNJXvCEW80_I6FM69vDh7LSc0cECrxUcZZZulhYZ9x9H3WevKpTgzXHdW49ZARGXcYZSNmhvFFimbWg1oiGupfBBggrwLWH8LIpDEDywMmNCAxiudSTOSD1f5nBOqEKEE2CvCbmRmqvE4gIBhETkobUILkjTyWa6KnNhTCuxXP7dfUWOnPDL7x_XpL5d7-CGHJqP7Xyzvi0U-AWjFpye
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4ImugJFYxve_Doyvax7O6RoAgBcRMh4Uba7pQQwyM8PPjrnS4rxsSLt6ZJ03Sm7fRr-81HyJ2II8MZKI8brjypbeAp5rKIRpLZmAlroq3YRNjrRcNhnBTI_Y4LAwDZ5zN4cMXsLT-dm427Kqs6Fil3ZK09p5yVs7V28Mq9keUcYObH1XYjqT--4ZSTPsJA7md5OYNfIipZDGmW_tf7Ean8kPFosgszx6QAsxNS-lZjoPniLJNO3RiMIc6jszF9BFjQPHnqmPbV6n1F3Z0rfcU9Yjr5hJQ-JwMPj87OzyltT3FjwVZm7nqpkEHzqd9oeblUgjfBca89ZgWEXEYp2tlgxFFgmbY1qxEPcS2FDxJUgKcJ46dhVAPBAytTJjSA4VqH4pQUZ_MZnBGqEOMEWGtq3EjNVWxxgABCIvbQWgTnpOxsM1pss2GMcrNc_F19Sw5a_ZfuqNvudS7JoXPE9jPIFSmulxu4JvvmYz1ZLW8yZ34B-7qf5w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+Parallel+and+Distributed+Systems&rft.atitle=Accelerating+Deep+Learning+Tasks+with+Optimized+GPU-assisted+Image+Decoding&rft.au=Wang%2C+Lipeng&rft.au=Luo%2C+Qiong&rft.au=Yan%2C+Shengen&rft.date=2020-12-01&rft.pub=IEEE&rft.eissn=2690-5965&rft.spage=274&rft.epage=281&rft_id=info:doi/10.1109%2FICPADS51040.2020.00045&rft.externalDocID=9359201