Distributed scheduling and data sharing in late-binding overlays

Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise sta...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2014 International Conference on High Performance Computing & Simulation (HPCS) s. 129 - 136
Hlavní autoři: Delgado Peris, Antonio, Hernandez, Jose M., Huedo, Eduardo
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.07.2014
Témata:
ISBN:9781479953127, 1479953121
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.
ISBN:9781479953127
1479953121
DOI:10.1109/HPCSim.2014.6903678