Large-Scale Merging of Histograms using Distributed In-Memory Computing

Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of physics. Conference series Ročník 664; číslo 9; s. 92003 - 92008
Hlavní autori: Blomer, Jakob, Ganis, Gerardo
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Bristol IOP Publishing 23.12.2015
Predmet:
ISSN:1742-6588, 1742-6596
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but also on the number partial histogram output files. That means, while the time to analyze data decreases linearly with the number of worker nodes, the time to merge the histograms in fact increases with the number of worker nodes. On the grid, merging jobs that take a few hours are not unusual. In order to improve the situation, we present a distributed and decentral merging algorithm whose running time is independent of the number of worker nodes. We exploit full bisection bandwidth of local networks and we keep all intermediate results in memory. We present benchmarks from an implementation using the parallel ROOT facility (PROOF) and RAMCloud, a distributed key-value store that keeps all data in DRAM.
AbstractList Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but also on the number partial histogram output files. That means, while the time to analyze data decreases linearly with the number of worker nodes, the time to merge the histograms in fact increases with the number of worker nodes. On the grid, merging jobs that take a few hours are not unusual. In order to improve the situation, we present a distributed and decentral merging algorithm whose running time is independent of the number of worker nodes. We exploit full bisection bandwidth of local networks and we keep all intermediate results in memory. We present benchmarks from an implementation using the parallel ROOT facility (PROOF) and RAMCloud, a distributed key-value store that keeps all data in DRAM.
Author Blomer, Jakob
Ganis, Gerardo
Author_xml – sequence: 1
  givenname: Jakob
  surname: Blomer
  fullname: Blomer, Jakob
  email: jblomer@cern.ch
  organization: CERN , Geneva, Switzerland
– sequence: 2
  givenname: Gerardo
  surname: Ganis
  fullname: Ganis, Gerardo
  organization: CERN , Geneva, Switzerland
BookMark eNqFkE1PwzAMhiM0JLbBX0CVOJfmo22SIyqwTerEAThHSZZWndamJO1h_55UncYRX2zZ72tbzwosOtsZAB4RfEaQsQTRFMd5xvMkz9OEJ5BjCMkNWF4Hi2vN2B1YeX8MghB0CTaldLWJP7U8mWhvXN10dWSraNv4wdZOtj4a_dR7DQ3XqHEwh2jXxXvTWneOCtv24xDm9-C2kidvHi55Db7f376KbVx-bHbFSxlrgtEQG220Ck9XKCWUcJ0xTjQzmkGZEYo1QQjjVCl9QAopjjKO0mCgSlWaKyjJGjzNe3tnf0bjB3G0o-vCSYEzmmeEpAwHVT6rtLPeO1OJ3jWtdGeBoJigiYmHmNiIAE1wMUMLRjwbG9v_bf7H9AsWv2-V
Cites_doi 10.1016/S0168-9002(97)00048-X
10.1016/j.cpc.2011.02.008
10.1016/j.cpc.2009.08.005
10.1145/1965724.1965751
10.1145/1327452.1327492
ContentType Journal Article
Copyright Published under licence by IOP Publishing Ltd
2015. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: Published under licence by IOP Publishing Ltd
– notice: 2015. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID O3W
TSCCA
AAYXX
CITATION
8FD
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
H8D
HCIFZ
L7M
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOI 10.1088/1742-6596/664/9/092003
DatabaseName Institute of Physics Open Access Journal Titles
IOPscience (Open Access)
CrossRef
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central - New (Subscription)
Technology Collection
ProQuest One Community College
ProQuest Central
Aerospace Database
SciTech Premium Collection
Advanced Technologies Database with Aerospace
AAdvanced Technologies & Aerospace Database (subscription)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database (subscription)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
Aerospace Database
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
Advanced Technologies Database with Aerospace
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: O3W
  name: Institute of Physics Open Access Journal Titles
  url: http://iopscience.iop.org/
  sourceTypes: Publisher
– sequence: 2
  dbid: PIMPY
  name: Publicly Available Content Database (subscription)
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
DocumentTitleAlternate Large-Scale Merging of Histograms using Distributed In-Memory Computing
EISSN 1742-6596
ExternalDocumentID 10_1088_1742_6596_664_9_092003
jpconf15_664_092003
GroupedDBID 02O
1JI
1WK
29L
2WC
4.4
5B3
5GY
5PX
5VS
7.Q
AAJIO
AAJKP
AALHV
ABHWH
ACAFW
ACHIP
AEFHF
AEJGL
AFKRA
AFYNE
AHSEE
AIYBF
AKPSB
ALMA_UNASSIGNED_HOLDINGS
ARAPS
ASPBG
ATQHT
AVWKF
AZFZN
BBWZM
BENPR
BGLVJ
C1A
CCPQU
CEBXE
CJUJL
CRLBU
CS3
DU5
E3Z
EBS
EDWGO
EJD
EQZZN
F5P
FEDTE
FRP
GROUPED_DOAJ
GX1
H13
HCIFZ
HH5
HVGLF
IJHAN
IOP
IZVLO
J9A
JCGBZ
KNG
KQ8
LAP
M48
N5L
N9A
O3W
OK1
P2P
PIMPY
PJBAE
Q02
RIN
RNS
RO9
ROL
S3P
SY9
T37
TR2
TSCCA
UCJ
W28
XSB
~02
AAYXX
AEINN
AFFHD
CITATION
OVT
PHGZM
PHGZT
PQGLB
8FD
8FE
8FG
ABUWG
AZQEC
DWQXO
H8D
L7M
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c321t-ececb088f143739c5893c8ec80a5372c311224bbcd1b1b915914cb07bbfc9b0a3
IEDL.DBID O3W
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000372140603071&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1742-6588
IngestDate Fri Jul 25 07:49:58 EDT 2025
Sat Nov 29 06:29:01 EST 2025
Wed Aug 21 03:33:46 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
http://iopscience.iop.org/info/page/text-and-data-mining
http://creativecommons.org/licenses/by/3.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c321t-ececb088f143739c5893c8ec80a5372c311224bbcd1b1b915914cb07bbfc9b0a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://iopscience.iop.org/article/10.1088/1742-6596/664/9/092003
PQID 2576533482
PQPubID 4998668
PageCount 6
ParticipantIDs proquest_journals_2576533482
crossref_primary_10_1088_1742_6596_664_9_092003
iop_journals_10_1088_1742_6596_664_9_092003
PublicationCentury 2000
PublicationDate 20151223
PublicationDateYYYYMMDD 2015-12-23
PublicationDate_xml – month: 12
  year: 2015
  text: 20151223
  day: 23
PublicationDecade 2010
PublicationPlace Bristol
PublicationPlace_xml – name: Bristol
PublicationTitle Journal of physics. Conference series
PublicationTitleAlternate J. Phys.: Conf. Ser
PublicationYear 2015
Publisher IOP Publishing
Publisher_xml – name: IOP Publishing
References 1
2
3
4
5
6
7
(9) 2014
White T (8) 2009
References_xml – ident: 2
– year: 2009
  ident: 8
  publication-title: Hadoop: The Definitive Guide
– ident: 4
  doi: 10.1016/S0168-9002(97)00048-X
– ident: 6
  doi: 10.1016/j.cpc.2011.02.008
– ident: 5
  doi: 10.1016/j.cpc.2009.08.005
– ident: 1
  doi: 10.1145/1965724.1965751
– ident: 7
– year: 2014
  ident: 9
– ident: 3
  doi: 10.1145/1327452.1327492
SSID ssj0033337
Score 2.0926704
Snippet Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms....
SourceID proquest
crossref
iop
SourceType Aggregation Database
Index Database
Publisher
StartPage 92003
SubjectTerms Algorithms
Data storage
Distributed memory
Histograms
Nodes
Physics
Run time (computers)
SummonAdditionalLinks – databaseName: AAdvanced Technologies & Aerospace Database (subscription)
  dbid: P5Z
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7uquDFt7i6Sg7eJPSRpk1OIuqq4C4LKixeQpMmspd23a6C_95MHywi6MEekwbKTDozyXwzH0JnIdU0tYoR52stgcweSd3RlqgsppEROvIzvyKbSEYjPpmIcXPhVjawytYmVoY6KzTckXsQGFdlo-HF7I0AaxRkVxsKjQ5ahS4JQN0wZi-tJabuSeqCyJA4T8vbCmF36GvGROzFceQJzxeA0vrmnDrTYvbDQlduZ7D13w_eRptNwIkv6x2yg1ZMvovWK-CnLvfQ7QNAwcmjU5XBQzMHziJcWFy1DwHkVokBGv-Kr6HDLpBjmQzf52QICN1PXJNCuPl99Dy4ebq6Iw25AtE0DBbEaKOVk4QNoLmR0MwFLpobzf2U0STUNICcm1I6C1SghIt6gsgtSJSyWig_pQeomxe5OUQ4Dq3lqeIRM2nEmOIqSdMwyXyr3Vhke8hrpSpndQ8NWeW-OZegBwl6kE4PUshaDz107oQvm9-p_PPtfquB5ZKl-I9-nz5GGy4IYgBRCWkfdRfzd3OC1vTHYlrOT6sN9QVKfc2e
  priority: 102
  providerName: ProQuest
Title Large-Scale Merging of Histograms using Distributed In-Memory Computing
URI https://iopscience.iop.org/article/10.1088/1742-6596/664/9/092003
https://www.proquest.com/docview/2576533482
Volume 664
WOSCitedRecordID wos000372140603071&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIOP
  databaseName: Institute of Physics Open Access Journal Titles
  customDbUrl:
  eissn: 1742-6596
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0033337
  issn: 1742-6588
  databaseCode: O3W
  dateStart: 20040101
  isFulltext: true
  titleUrlDefault: http://iopscience.iop.org/
  providerName: IOP Publishing
– providerCode: PRVPQU
  databaseName: AAdvanced Technologies & Aerospace Database (subscription)
  customDbUrl:
  eissn: 1742-6596
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0033337
  issn: 1742-6588
  databaseCode: P5Z
  dateStart: 20040801
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central (NC Live)
  customDbUrl:
  eissn: 1742-6596
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0033337
  issn: 1742-6588
  databaseCode: BENPR
  dateStart: 20040801
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database (subscription)
  customDbUrl:
  eissn: 1742-6596
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0033337
  issn: 1742-6588
  databaseCode: PIMPY
  dateStart: 20040801
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3dS8MwEA-6KfjitzidIw--SexHmjZ59GPTgZvFD5y-lCZNxJdurFPwv_fSdoiIiGAfSkl74fglzV3I7-4QOvSpoqmRjICtNcSe7JEUtrZEZiENtFCBm7llsYloOOSjkYhrNmEZCzOe1Ev_MTxWiYIrCGtCHHfAh_ZJyETohGHgCMcVVb7PJuVgzGFKX9OH-WJM4YqqmEgrw_k8SPjHfr7Yp0XQ4dsiXVqe3to_6LyOVmu3E59UAhtoQeebaLmkf6piC11cWUI4uYUB03igp7ZyER4bXCYRsfytAluC_DM-t3l2bYksneF-TgaWp_uOq9IQ8H4b3fe6d2eXpC6xQBT1vRnRSisJChrPpjgSioH7orhW3E0ZjXxFPXvyJqXKPOlJAb6PF4BAJKVRQrop3UGNfJzrXYRD3xieSh4wnQaMSS6jNPWjzDUK2gLTQs4c2GRSZdJIyhNwzhMLT2LhSQCeRCQVPC10BHgm9U9V_Pp1ez5OnyJ2O1UGG_t7f-psH62AZ8Qsb8WnbdSYTV_1AVpSb7OXYtpBzdPuML7plFMN7jF7gra4P4gfPwDR-s6J
linkProvider IOP Publishing
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB61WxBcyqtVX4APcEJWEjsP-4AQoi1ddXe1EkUqJxM7dtXL7rLZgvqn-I2dyUMVQqKnHsjRjiMl33hmnPlmBuCNkE6WwWYcbW3gFNnjJR5tua1ymXrt0riKm2YTxWSizs_1dA1-97kwRKvsdWKjqKu5o3_kETnGTdqo-LD4walrFEVX-xYarVic-utfeGSr3w8PEd-3QhwfnX064V1XAe6kSFbcO-8s7q2QUFUf7TK02E55p-Iyk4VwMqFgk7WuSmxiNZr7JMUFhbXBaRuXEp-7DhspCfsANqbD8fRbr_slXkWbgik42nbV5yTjMbMb03mU52mko1gTL-wPc7h-OV_8ZRMaQ3f85H_7RE9hs3Op2cd2DzyDNT97Dg8baqurX8DnEZHd-RcURs_Gfkldmdg8sKZACnHTakbk_wt2SDWEqf2Xr9hwxsfEQb5mbdsLnN-Cr_fyFtswmM1nfgdYLkJQpVVp5ss0y6yyRVmKooqDw7E07ELUo2gWbZUQ00T3lTKEuyHcDeJutGlx34V3CLbpFEZ9590HPeK3S27h3vv39Gt4dHI2HpnRcHK6D4_R5cuIkCPkAQxWyyv_Eh64n6vLevmqE2cG3-9bPG4ApUIsgw
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEB5aX3jxLb7NwZvEfWQfyVGs1WJbCyr2FjbZRLy0pa2C_97M7lYRERHc07K7E4Yv2cmE-WYG4CRkmmVWxdTttZZiZI9m7mhLVZ6wyAgd-blfNJtIu13e74teDS4_cmGGo8r0n7nbslBwCWFFiOOe86FDmsQi8ZIk8oTnCyRYeaPc1mEeq5Xg4r5ljzODzNyVlnmRKMf5LFH4x7G-7FF1p8c3Q13sPs3Vf9J7DVYq95Ocl0LrUDODDVgsaKB6sglXbSSG0zs3cYZ0zBg7GJGhJUUxEeRxTQgS5Z9IA-vtYqssk5PWgHaQr_tGyhYR7v0WPDQv7y-uadVqgWoWBlNqtNHKKWkDLHUkdOzcGM2N5n4WszTULMAInFI6D1SghPOBgsgJpEpZLZSfsW2YGwwHZgdIElrLM8Wj2GRRHCuu0iwL09y32j2L7C54M3DlqKyoIYtIOOcSIZIIkXQQSSFLiHbh1GEqq59r8uvXB7O5-hTBY1WRdBzu_WmwY1jqNZqy3ere7MOyc5ZipLKE7ADmpuMXcwgL-nX6PBkfFSvuHcuWz6U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Merging+of+Histograms+using+Distributed+In-Memory+Computing&rft.jtitle=Journal+of+physics.+Conference+series&rft.au=Blomer%2C+Jakob&rft.au=Ganis%2C+Gerardo&rft.date=2015-12-23&rft.issn=1742-6588&rft.eissn=1742-6596&rft.volume=664&rft.issue=9&rft.spage=92003&rft_id=info:doi/10.1088%2F1742-6596%2F664%2F9%2F092003&rft.externalDBID=n%2Fa&rft.externalDocID=10_1088_1742_6596_664_9_092003
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1742-6588&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1742-6588&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1742-6588&client=summon