A Serverless Engine for High Energy Physics Distributed Analysis

The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled tradit...

Full description

Saved in:
Bibliographic Details
Published in:2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) pp. 575 - 584
Main Authors: Kusnierz, Jacek, Padulano, Vincenzo E., Malawski, Maciej, Burkiewicz, Kamil, Saavedra, Enric Tejedor, Alonso-Jorda, Pedro, Pitt, Michael, Avati, Valentina
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.
AbstractList The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.
Author Avati, Valentina
Malawski, Maciej
Kusnierz, Jacek
Saavedra, Enric Tejedor
Burkiewicz, Kamil
Alonso-Jorda, Pedro
Pitt, Michael
Padulano, Vincenzo E.
Author_xml – sequence: 1
  givenname: Jacek
  surname: Kusnierz
  fullname: Kusnierz, Jacek
  email: kusnierz@protonmail.com
  organization: Institute of Computer Science, AGH,Kraków,Poland
– sequence: 2
  givenname: Vincenzo E.
  surname: Padulano
  fullname: Padulano, Vincenzo E.
  email: vincenzo.eduardo.padulano@cern.ch
  organization: EP-SFT, CERN,Geneva,Switzerland
– sequence: 3
  givenname: Maciej
  surname: Malawski
  fullname: Malawski, Maciej
  email: malawski@agh.edu.pl
  organization: Institute of Computer Science, AGH,Kraków,Poland
– sequence: 4
  givenname: Kamil
  surname: Burkiewicz
  fullname: Burkiewicz, Kamil
  organization: Institute of Computer Science, AGH,Kraków,Poland
– sequence: 5
  givenname: Enric Tejedor
  surname: Saavedra
  fullname: Saavedra, Enric Tejedor
  organization: Institute of Computer Science, AGH,Kraków,Poland
– sequence: 6
  givenname: Pedro
  surname: Alonso-Jorda
  fullname: Alonso-Jorda, Pedro
  email: palonso@upv.es
  organization: DSIC, UPV,Valencia,Spain
– sequence: 7
  givenname: Michael
  surname: Pitt
  fullname: Pitt, Michael
  organization: EP-CMG-OS, CERN,Geneva,Switzerland
– sequence: 8
  givenname: Valentina
  surname: Avati
  fullname: Avati, Valentina
  organization: EP-UHC, CERN,Geneva,Switzerland
BookMark eNotjs1Kw0AURkfQhdY-gSDzAonzP3N3hlhbodCCui5J5046EFOZiULe3oiuDt9ZfJwbcjmcByTknrOScwYPdb1O0WulnSoFE6JkjBl7QZZgHTdGKwBt4Jo8VvQV0zemHnOmq6GLA9JwTnQTu9O8MXUT3Z-mHI-ZPsU8pth-jehpNTT9bPMtuQpNn3H5zwV5f1691Ztiu1u_1NW2OApnxwIEgPutEUE4poIVrG21DI0P2jDw3DVK28C5YSh1AGGk9S54Myv0XskFufv7jYh4-Ezxo0nTAZwwTBr5A-BfRyE
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CCGrid54584.2022.00067
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665499569
1665499567
EndPage 584
ExternalDocumentID 9826036
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-c287t-9299845842f2804f720bb53fadf5609d18a457f1160e35f92637d8fd6f11edd43
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000855065800058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:36:46 EDT 2023
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c287t-9299845842f2804f720bb53fadf5609d18a457f1160e35f92637d8fd6f11edd43
OpenAccessLink http://cds.cern.ch/record/2815205
PageCount 10
ParticipantIDs ieee_primary_9826036
PublicationCentury 2000
PublicationDate 2022-May
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May
PublicationDecade 2020
PublicationTitle 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
PublicationTitleAbbrev CCGRID
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8582599
Snippet The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific...
SourceID ieee
SourceType Publisher
StartPage 575
SubjectTerms AWS
C++ languages
CERN
Codes
Computer architecture
Distributed Computing
HEP
Lambda
Large Hadron Collider
MapReduce
ROOT
Runtime
Serverless
Serverless computing
Web services
Title A Serverless Engine for High Energy Physics Distributed Analysis
URI https://ieeexplore.ieee.org/document/9826036
WOSCitedRecordID wos000855065800058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMToBbxLQ-MhCaxYycbKFAYUNUBULcq8Z2lLi1qU34_PqcUIbGw2V6ss2X7ne_eO4BrUoSadSpRK--gYK2jWlnnfZ4Es8ygTkKNpfcXMx7n02kx6cDNjgtDRCH5jG65GWL5uLQb_iobFh4L-xu3C11jdMvV2pJ-k7gYluXTao4cCOK_kjQIcerfVVPCozE6-N90hzD4Yd-Jye5dOYIOLfpwdy_4WBPHxteiVREUHnAKTtTwfWbwiZDOadfigdVwuZAVofhWHRnA2-jxtXyOttUPIuu9mCbyuKXI2ZjUpXmsnEnjus6kq9B5lFJgklcqMy5JdEwyc0WqpcHcofZDhKjkMfQWywWdgCBjjMs8dvDelQdotlbaxVIa6yRRRXQKfbZ-9tEKXMy2hp_9PXwO-7y8bdbfBfSa1YYuYc9-NvP16irsyhdvMo70
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmuhJDRi_7cGjK7vbr92bBkWMSDig4UbYdppwAcOCv99OFzEmXry1vTTTpu2bzrw3ANco0CrSqbRKeAfFFioqhHHe50mslNqqJNRYeu_pfj8bjfJBDW42XBhEDMlneEvNEMu3c7Oir7JW7rGwv3G3YFsKkcYVW2tN-03ivNVuPy2mlkJB9FuSBilO9btuSng2Ovv_m_AAmj_8OzbYvCyHUMNZA-7uGR1spOh4ySodQeYhJ6NUDd8nDh8LCZ2mZA-kh0ulrNCyb92RJrx1HoftbrSufxAZ78csI49c8oyMSV2axcLpNC4Kyd3EOo9TcptkEyG1SxIVI5cuTxXXNnNW-SG0VvAjqM_mMzwGhlprJz168P6Vh2imEMrFnGvjOOIE8QQaZP34o5K4GK8NP_17-Ap2u8PX3rj33H85gz1a6ioH8Bzqy8UKL2DHfC6n5eIy7NAXEJiSOw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+22nd+IEEE+International+Symposium+on+Cluster%2C+Cloud+and+Internet+Computing+%28CCGrid%29&rft.atitle=A+Serverless+Engine+for+High+Energy+Physics+Distributed+Analysis&rft.au=Kusnierz%2C+Jacek&rft.au=Padulano%2C+Vincenzo+E.&rft.au=Malawski%2C+Maciej&rft.au=Burkiewicz%2C+Kamil&rft.date=2022-05-01&rft.pub=IEEE&rft.spage=575&rft.epage=584&rft_id=info:doi/10.1109%2FCCGrid54584.2022.00067&rft.externalDocID=9826036