CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators

With DNN turning into the backbone of AI cloud services and propelling the emergence of INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the indispensable components of cloud inference systems. Due to the conservative "one-task-at-a-time" working mode and deadline bli...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 34; no. 7; pp. 1 - 17
Main Authors: Wang, Chunyang, Bai, Yuebin, Sun, Desen
Format: Journal Article
Language:English
Published: New York IEEE 01.07.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1045-9219, 1558-2183
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With DNN turning into the backbone of AI cloud services and propelling the emergence of INFerence-as-a-Service (INFaaS), DNN-specific accelerators have become the indispensable components of cloud inference systems. Due to the conservative "one-task-at-a-time" working mode and deadline blindness of those accelerators, implementing multi-tenancy that aims to improve the cost-effectiveness and meet SLA requirements is intractable. Recent studies including the temporal and spatial approaches, employ manifold scheduling mechanisms and sophisticated architecture innovations to address the challenge. However, these researches either still neglect the deadline awareness or render inevitable and expensive hardware overheads such as switches and storage. In this paper, we present Cooperative and Deadline-aware Multi-Systolic-Array scheduling (CD-MSA), a low-cost solution for the cloud inference that utilizes the real time mechanism and task-level parallelism to enable efficient multi-tenancy. Based on our preemptive multi-systolic-array accelerator architecture supporting the simultaneous task co-location, we first construct a fine-grained DNN execution model to lay the groundwork for the lightweight preemption. Second, we design a cooperative, deadline- and laxity-aware scheduler in conjunction with an efficient schedulability test method for better QoS guarantee without introducing additional hardware cost. Finally, to further promote the overall throughput, we propose dynamic task fusion , a software approach that fuses different tasks into the logically "multi-threading" tasks at runtime. We compare CD-MSA with several state-of-the-art researches across three multi-DNN workloads. The evaluation results show CD-MSA improves the latency-bounded throughput, SLA satisfaction rate and weighted system throughput by up to 62%, 63% and 27%, respectively.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2023.3276759