DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Edge computing has emerged as a trend to improve scalability, overhead, and privacy by processing large-scale data, e.g., in deep learning applications locally at the source. In IoT networks, edge devices are characterized by tight resource constraints and often dynamic nature of data sources, where...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on computer-aided design of integrated circuits and systems Ročník 37; číslo 11; s. 2348 - 2359
Hlavní autoři:	Zhao, Zhuoran, Barijough, Kamyar Mirzazad, Gerstlauer, Andreas
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.11.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Accuracy Artificial neural networks Clusters Constraints Data sources Deep learning Distributed databases distributed inference Dynamic scheduling Edge computing Electronic devices Embedded systems Footprints Inference Internet of Things Logic gates Neural networks Parallel processing Runtime Scheduling Task analysis
ISSN:	0278-0070, 1937-4151
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Edge computing has emerged as a trend to improve scalability, overhead, and privacy by processing large-scale data, e.g., in deep learning applications locally at the source. In IoT networks, edge devices are characterized by tight resource constraints and often dynamic nature of data sources, where existing approaches for deploying Deep/Convolutional Neural Networks (DNNs/CNNs) can only meet IoT constraints when severely reducing accuracy or using a static distribution that cannot adapt to dynamic IoT environments. In this paper, we propose DeepThings, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters. DeepThings employs a scalable Fused Tile Partitioning (FTP) of convolutional layers to minimize memory footprint while exposing parallelism. It further realizes a distributed work stealing approach to enable dynamic workload distribution and balancing at inference runtime. Finally, we employ a novel work scheduling process to improve data reuse and reduce overall execution latency. Results show that our proposed FTP method can reduce memory footprint by more than 68% without sacrificing accuracy. Furthermore, compared to existing work sharing methods, our distributed work stealing and work scheduling improve throughput by <inline-formula> <tex-math notation="LaTeX">1.7\times -2.2\times </tex-math></inline-formula> with multiple dynamic data sources. When combined, DeepThings provides scalable CNN inference speedups of <inline-formula> <tex-math notation="LaTeX">1.7\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">3.5\times </tex-math></inline-formula> on 2-6 edge devices with less than 23 MB memory each.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2018.2858384