Multicore in production: advantages and limits of the multi-process approach in the ATLAS experiment

The shared memory architecture of multicore CPUs provides HEP developers with the opportunity to reduce the memory footprint of their applications by sharing memory pages between the cores in a processor. ATLAS pioneered the multi-process approach to parallelize HEP applications. Using Linux fork()...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of physics. Conference series Ročník 368; s. 1 - 7
Hlavní autoři: Binet, S, Calafiura, P, Jha, M K, Lavrijsen, W, Leggett, C, Lesny, D, Severini, H, Smith, D, Snyder, S, Tatarkhanov, M, Tsulaia, V, VanGemmeren, P, Washbrook, A
Médium: Journal Article
Jazyk:angličtina
Vydáno: 01.01.2012
Témata:
ISSN:1742-6588, 1742-6596
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The shared memory architecture of multicore CPUs provides HEP developers with the opportunity to reduce the memory footprint of their applications by sharing memory pages between the cores in a processor. ATLAS pioneered the multi-process approach to parallelize HEP applications. Using Linux fork() and the Copy On Write mechanism we implemented a simple event task farm, which allowed us to achieve sharing of almost 80% of memory pages among event worker processes for certain types of reconstruction jobs with negligible CPU overhead. By leaving the task of managing shared memory pages to the operating system, we have been able to parallelize large reconstruction and simulation applications originally written to be run in a single thread of execution with little to no change to the application code. The process of validating AthenaMP for production took ten months of concentrated effort and is expected to continue for several more months. Besides validating the software itself, an important and time-consuming aspect of running multicore applications in production was to configure the ATLAS distributed production system to handle multicore jobs. This entailed defining multicore batch queues, where the unit resource is not a core, but a whole computing node; monitoring the output of many event workers; and adapting the job definition layer to handle computing resources with different event throughputs. We will present scalability and memory usage studies, based on data gathered both on dedicated hardware and at the CERN Computer Center.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/368/1/012018