Reliable scalable symbolic computation: The design of SymGridPar2

Symbolic computation is an important area of both Mathematics and Computer Science, with many large computations that would benefit from parallel execution. Symbolic computations are, however, challenging to parallelise as they have complex data and control structures, and both dynamic and highly ir...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computer languages, systems & structures Ročník 40; číslo 1; s. 19 - 35
Hlavní autoři: Maier, P., Stewart, R., Trinder, P.W.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.04.2014
Témata:
ISSN:1477-8424, 1873-6866
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Symbolic computation is an important area of both Mathematics and Computer Science, with many large computations that would benefit from parallel execution. Symbolic computations are, however, challenging to parallelise as they have complex data and control structures, and both dynamic and highly irregular parallelism. The SymGridPar framework (SGP) has been developed to address these challenges on small-scale parallel architectures. However the multicore revolution means that the number of cores and the number of failures are growing exponentially, and that the communication topology is becoming increasingly complex. Hence an improved parallel symbolic computation framework is required. This paper presents the design and initial evaluation of SymGridPar2 (SGP2), a successor to SymGridPar that is designed to provide scalability onto 105 cores, and hence also provide fault tolerance. We present the SGP2 design goals, principles and architecture. We describe how scalability is achieved using layering and by allowing the programmer to control task placement. We outline how fault tolerance is provided by supervising remote computations, and outline higher-level fault tolerance abstractions. We describe the SGP2 implementation status and development plans. We report the scalability and efficiency, including weak scaling to about 32,000 cores, and investigate the overheads of tolerating faults for simple symbolic computations. •This paper presents the design and initial evaluation of SymGridPar2.•SymGridPar2 is designed to provide scalability and fault tolerance.•Scalability using layering and by allowing the programmer to control task placement.•We report the scalability and efficiency, including weak scaling to about 32k cores.•We present a fault tolerant work stealing protocol and measure supervision overheads.
ISSN:1477-8424
1873-6866
DOI:10.1016/j.cl.2014.03.001