A general and fast distributed system for large-scale dynamic programming applications
•Propose a DAG based Parallel DP Programming model, consisting of a DAG pattern library and a vertex compute method.•Develop a general and fast distributed framework called DPX10 based on proposed programming model.•Propose and implement a new comprehensive straggler mitigation strategy in DPX10 for...
Uložené v:
| Vydané v: | Parallel computing Ročník 60; s. 1 - 21 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
01.12.2016
|
| Predmet: | |
| ISSN: | 0167-8191, 1872-7336 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | •Propose a DAG based Parallel DP Programming model, consisting of a DAG pattern library and a vertex compute method.•Develop a general and fast distributed framework called DPX10 based on proposed programming model.•Propose and implement a new comprehensive straggler mitigation strategy in DPX10 for DP computation.•Demonstrate and validate the simplicity, efficiency and scalability of DPX10 with four large scale DP applications experimentally.
Dynamic programming is an important technique widely used in many scientific applications. Due to the massive volume of applications’ data in practice, parallel and distributed DP is a must. However, writing a parallel and distributed DP program is difficult and error-prone because of its intrinsically strong data dependency. In this paper, we present DPX10, a DAG-based distributed X10 framework aiming at simplifying the parallel programming for DP applications. DPX10 enables users to write highly efficient parallel DP programs without much effort. For DPX10 programming, users only need to do two things: 1) Instantiating a DAG pattern by indicating the dependency between vertices of the DAG; 2) Implementing the DP application’s logic in the compute method of the vertices. DPX10 provides eight commonly used DAG patterns and a simple API to allow users to customize their own DAG patterns. All the tiresome work of DP parallelization including DAG distribution, tasks scheduling, and tasks communication are hidden from users and covered by DPX10. Moreover, DPX10 is fault-tolerant and has a mechanism to handle the problem of straggler tasks, which run much slower than other tasks due to unexpected resource contention. Finally, we use four DP applications with up to 2 billion vertices running on 240 cores to demonstrate the simplicity, efficiency, and scalability of our proposed framework. |
|---|---|
| ISSN: | 0167-8191 1872-7336 |
| DOI: | 10.1016/j.parco.2016.09.002 |