Modeling and Simulating Multiple Failure Masking Enabled by Local Recovery for Stencil-Based Applications at Extreme Scales

Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more tradi...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 28; no. 10; pp. 2881 - 2895
Main Authors: Gamell, Marc, Teranishi, Keita, Mayo, Jackson, Kolla, Hemanth, Heroux, Michael A., Chen, Jacqueline, Parashar, Manish
Format: Journal Article
Language:English
Published: New York IEEE 01.10.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1045-9219, 1558-2183
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. Our results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.
AbstractList Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. Our results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.
By obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Some previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. These results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.
Author Heroux, Michael A.
Parashar, Manish
Teranishi, Keita
Gamell, Marc
Chen, Jacqueline
Kolla, Hemanth
Mayo, Jackson
Author_xml – sequence: 1
  givenname: Marc
  surname: Gamell
  fullname: Gamell, Marc
  email: mgamell@cac.rutgers.edu
  organization: Rutgers Discovery Inf. Inst., Rutgers Univ., Piscataway, NJ, USA
– sequence: 2
  givenname: Keita
  surname: Teranishi
  fullname: Teranishi, Keita
  email: knteran@sandia.gov
  organization: Sandia Nat. Labs., Livermore, CA, USA
– sequence: 3
  givenname: Jackson
  surname: Mayo
  fullname: Mayo, Jackson
  email: jmayo@sandia.gov
  organization: Sandia Nat. Labs., Livermore, CA, USA
– sequence: 4
  givenname: Hemanth
  surname: Kolla
  fullname: Kolla, Hemanth
  email: hnkolla@sandia.gov
  organization: Sandia Nat. Labs., Livermore, CA, USA
– sequence: 5
  givenname: Michael A.
  surname: Heroux
  fullname: Heroux, Michael A.
  email: maherou@sandia.gov
  organization: Sandia Nat. Labs., Albuquerque, NM, USA
– sequence: 6
  givenname: Jacqueline
  surname: Chen
  fullname: Chen, Jacqueline
  email: jhchen@sandia.gov
  organization: Sandia Nat. Labs., Livermore, CA, USA
– sequence: 7
  givenname: Manish
  surname: Parashar
  fullname: Parashar, Manish
  email: parashar@cac.rutgers.edu
  organization: Rutgers Discovery Inf. Inst., Rutgers Univ., Piscataway, NJ, USA
BackLink https://www.osti.gov/servlets/purl/1356841$$D View this record in Osti.gov
BookMark eNp9kc9vFCEUxyemJvaHf4DxQvQ8KwwMDMfabtVkNzZuPROGeaNUFkZgjBv_-TJu46EHT0D4fN57ed-z6sQHD1X1iuAVIVi-u7u93q0aTMSq4ZK3tHtWnZK27eqGdPSk3DFra9kQ-aI6S-keY8JazE6rP9swgLP-G9J-QDu7n53Oy3M7u2wnB-hGWzdHQFudfiwfa697BwPqD2gTjHboC5jwC-IBjSGiXQZvrKvf61SYy2ly1pSCwSekM1r_zhH2gHbFg3RRPR-1S_Dy8Tyvvt6s764-1pvPHz5dXW5qQznNNTeUSSwNdANv2mYA0WlgACNAy3oOTDRyZGzsseE9xdj0DSViGHoBg-nbkZ5Xb451Q8pWJWMzmO8meA8mK0Jb3jFSoLdHaIrh5wwpq_swR1_mUg0RjHFJRVcocqRMDClFGNUU7V7HgyJYLUGoJQi1BKEegyiOeOKUCf7uJMey2_-ar4-mBYB_nYTEneSCPgCV1plo
CODEN ITDSEO
CitedBy_id crossref_primary_10_1109_TPDS_2021_3082802
crossref_primary_10_1177_10943420241265936
crossref_primary_10_1177_1094342021990433
crossref_primary_10_1016_j_future_2020_01_026
crossref_primary_10_1016_j_future_2018_09_041
crossref_primary_10_1016_j_future_2022_12_001
Cites_doi 10.1109/SC.2012.77
10.1109/ICPP.2012.45
10.1145/214451.214456
10.1088/1749-4699/2/1/015001
10.1109/IPDPS.2007.370605
10.1145/2503210.2503271
10.1145/2465813.2465814
10.1145/2807591.2807672
10.1145/62546.62575
10.1137/070693199
10.1145/2063384.2063427
10.1016/0743-7315(88)90027-5
10.1109/ExaMPI.2014.6
10.1109/CLUSTR.2003.1253321
10.1109/DSN.2015.52
10.1109/12.142678
10.1109/TC.2003.1197125
10.1145/2493123.2462908
10.1088/1742-6596/46/1/067
10.1145/2145816.2145845
10.1109/CLUSTER.2014.6968739
10.1177/1094342006067469
10.1109/IPDPS.2011.95
10.1177/1094342010391989
10.1109/TC.1984.1676475
10.1109/CLUSTR.2009.5289157
10.1109/SC.2010.18
10.1109/DSN.2014.101
10.1145/1551609.1551619
10.1109/SNAPI.2010.10
10.1145/568522.568525
10.1109/DSNW.2012.6264677
10.2172/1081941
10.1109/ICPP.2011.85
10.1109/IPDPS.2013.69
10.1007/978-3-540-75416-9_22
10.2172/1078029
10.1145/2807591.2807665
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017
CorporateAuthor Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
CorporateAuthor_xml – name: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
OIOZB
OTOTI
DOI 10.1109/TPDS.2017.2696538
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
OSTI.GOV - Hybrid
OSTI.GOV
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database


Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 2895
ExternalDocumentID 1356841
10_1109_TPDS_2017_2696538
7908967
Genre orig-research
GrantInformation_xml – fundername: Office of Advanced Scientific Computing Research
– fundername: EPSI
  grantid: DE-FG02-06ER54857
– fundername: Rutgers Discovery Informatics Institute
– fundername: Honeywell International
– fundername: ExaCT Combustion Co-Design Center
  grantid: 4000110839
– fundername: Analysis and Visualization (SDAV)
  grantid: DE-SC0007455
– fundername: National Nuclear Security Administration
  grantid: DE-NA0003525
  funderid: 10.13039/100006168
– fundername: Office of Science
  funderid: 10.13039/100006132
– fundername: National Technology and Engineering Solutions of Sandia
– fundername: U.S. Department of Energy’s
  funderid: 10.13039/100000015
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
TWZ
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ABPTK
OIOZB
OTOTI
PQEST
RIC
RIG
ID FETCH-LOGICAL-c363t-6c34909ce8d6252de78ae4eefee54b6e4729f44fb0c6b300cb2317ddb7edcb5f3
IEDL.DBID RIE
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000410653500013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Fri May 19 00:47:11 EDT 2023
Sun Nov 30 04:22:48 EST 2025
Sat Nov 29 03:36:10 EST 2025
Tue Nov 18 22:31:03 EST 2025
Wed Aug 27 02:52:20 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c363t-6c34909ce8d6252de78ae4eefee54b6e4729f44fb0c6b300cb2317ddb7edcb5f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
National Science Foundation (NSF)
SAND-2017-4099J
USDOE National Nuclear Security Administration (NNSA)
AC04-94AL85000; FG02-06ER54857; SC0007455
ORCID 0000000288515660
OpenAccessLink https://www.osti.gov/servlets/purl/1356841
PQID 2174469378
PQPubID 85437
PageCount 15
ParticipantIDs proquest_journals_2174469378
crossref_primary_10_1109_TPDS_2017_2696538
ieee_primary_7908967
osti_scitechconnect_1356841
crossref_citationtrail_10_1109_TPDS_2017_2696538
PublicationCentury 2000
PublicationDate 2017-10-01
PublicationDateYYYYMMDD 2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 2017-10-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: United States
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2017
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref34
ref37
ref15
ref36
gamell (ref12) 2015
ref14
ref31
ref30
ref33
ref11
ref32
ref10
hursey (ref16) 2010
ref1
ref39
ref17
ref38
coti (ref21) 2006
ref19
ref18
zheng (ref26) 2004
beckman (ref6) 2012
ref46
ref24
ref23
ref47
ref25
ref20
ref42
ref41
teranishi (ref40) 2014
ref22
ref44
katz (ref5) 2009
ref43
ref28
ref27
ref29
ref8
ref7
ref9
ref3
gamell (ref4) 2014
hoefler (ref45) 2007
amarasinghe (ref2) 2009
References_xml – ident: ref34
  doi: 10.1109/SC.2012.77
– ident: ref32
  doi: 10.1109/ICPP.2012.45
– ident: ref20
  doi: 10.1145/214451.214456
– ident: ref44
  doi: 10.1088/1749-4699/2/1/015001
– ident: ref14
  doi: 10.1109/IPDPS.2007.370605
– ident: ref23
  doi: 10.1145/2503210.2503271
– ident: ref10
  doi: 10.1145/2465813.2465814
– ident: ref11
  doi: 10.1145/2807591.2807672
– year: 2009
  ident: ref2
  article-title: ExaScale software study: Software challenges in extreme scale systems
– ident: ref35
  doi: 10.1145/62546.62575
– ident: ref38
  doi: 10.1137/070693199
– ident: ref30
  doi: 10.1145/2063384.2063427
– ident: ref42
  doi: 10.1016/0743-7315(88)90027-5
– ident: ref39
  doi: 10.1109/ExaMPI.2014.6
– ident: ref19
  doi: 10.1109/CLUSTR.2003.1253321
– ident: ref47
  doi: 10.1109/DSN.2015.52
– ident: ref36
  doi: 10.1109/12.142678
– ident: ref13
  doi: 10.1109/TC.2003.1197125
– year: 2010
  ident: ref16
  article-title: Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems
– year: 2012
  ident: ref6
  article-title: Exascale operating systems and runtime software report
  publication-title: Tech Rep
– start-page: 93
  year: 2004
  ident: ref26
  article-title: FTC-Charm++: An in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
  publication-title: Proc IEEE Int Conf Cluster Comput
– ident: ref25
  doi: 10.1145/2493123.2462908
– ident: ref17
  doi: 10.1088/1742-6596/46/1/067
– ident: ref43
  doi: 10.1145/2145816.2145845
– ident: ref37
  doi: 10.1109/CLUSTER.2014.6968739
– ident: ref8
  doi: 10.1177/1094342006067469
– ident: ref24
  doi: 10.1109/IPDPS.2011.95
– ident: ref1
  doi: 10.1177/1094342010391989
– ident: ref41
  doi: 10.1109/TC.1984.1676475
– ident: ref22
  doi: 10.1109/CLUSTR.2009.5289157
– year: 2009
  ident: ref5
  article-title: Fault tolerance for extreme-scale computing workshop, albuquerque, NM - march 19-20, 2009
  publication-title: Tech Rep
– ident: ref31
  doi: 10.1109/SC.2010.18
– start-page: 279
  year: 2015
  ident: ref12
  article-title: Exploring failure recovery for stencil-based applications at extreme scales
  publication-title: Proc Int Symp High-Perform Parallel Distrib Comput
– start-page: 51:51
  year: 2014
  ident: ref40
  article-title: Toward local failure local recovery resilience model using MPI-ULFM
  publication-title: Proceedings of the 21st European MPI Users' Group Meeting
– ident: ref3
  doi: 10.1109/DSN.2014.101
– ident: ref15
  doi: 10.1145/1551609.1551619
– start-page: 895
  year: 2014
  ident: ref4
  article-title: Exploring automatic, online failure recovery for scientific applications at extreme scales
  publication-title: Proc Int Conf High Perform Comput Netw Storage Anal
– ident: ref29
  doi: 10.1109/SNAPI.2010.10
– ident: ref18
  doi: 10.1145/568522.568525
– ident: ref27
  doi: 10.1109/DSNW.2012.6264677
– ident: ref9
  doi: 10.2172/1081941
– ident: ref33
  doi: 10.1109/ICPP.2011.85
– ident: ref28
  doi: 10.1109/IPDPS.2013.69
– start-page: 125
  year: 2007
  ident: ref45
  article-title: A Case for Standard Non-Blocking Collective Operations
  publication-title: Recent Advances in Parallel Virtual Machine and Message Passing Interface
  doi: 10.1007/978-3-540-75416-9_22
– ident: ref7
  doi: 10.2172/1078029
– start-page: 18
  year: 2006
  ident: ref21
  article-title: Blocking versus non-blocking coordinated checkpointing for large-scale fault tolerant MPI
  publication-title: Proc Int Conf High Perform Comput Netw Storage Anal
– ident: ref46
  doi: 10.1145/2807591.2807665
SSID ssj0014504
Score 2.298248
Snippet Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully...
By obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be...
SourceID osti
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2881
SubjectTerms Computational modeling
Computer simulation
Delays
Failure
Failure analysis
failure masking
Failure rates
Fault tolerance
Fault tolerant systems
Hardware
Masking
MATHEMATICS AND COMPUTING
modeling
Parallel processing
Protocols
Recovery
Resilience
Restarting
stencil computation
Title Modeling and Simulating Multiple Failure Masking Enabled by Local Recovery for Stencil-Based Applications at Extreme Scales
URI https://ieeexplore.ieee.org/document/7908967
https://www.proquest.com/docview/2174469378
https://www.osti.gov/servlets/purl/1356841
Volume 28
WOSCitedRecordID wos000410653500013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NaxsxEB2S0EN7aNKkJW7SokNPpUpkSyutjmlr00MTAk4hN7EaacGQ2CHelIb--Wq0sjG0FHLaXdgveDs7bzQfD-ADetPqRGN5G-vIVbCCexkFRyUxiKbxyvZiE-bior6-tpdb8GndCxNjzMVn8YR2cy4_LPCBlspODSWptNmGbWN036u1zhioKksFpuii4jaZYclgDoU9vbr8OqUiLnMy0lZX1Iqy4YOyqEraLJJJ_fVDzl5msvu099uDl4VNsrMe_lewFef7sLtSamDFcPfhxcbYwQP4TQJo1IbOmnlg09ltlvBKh-elupBNmhmVq7PzZklL6WycO6wC84_sOzk_RlFrMoJHljgvmxLvnt3wz8kjBna2kRJnTcfGvzpag0wvk3zR8jX8mIyvvnzjRYSBo9Sy4xqlssJirEMKlUYhmrqJKsY2xkp5HVVi561SrReovRQCfWKMJtDU5oC-auUb2Jkv5vEQWLAqaIkmKCWV98KLxD3NqJEBK9UiDkCsYHFYJpSTUMaNy5GKsI6QdISkK0gO4OP6krt-PMf_Tj4gzNYnFrgGcETYu0Q5aG4uUoERdm4oK12r4QCOV5-EK-a9dBTHKZ2YXf323_c8guf05L7q7xh2uvuH-A6e4c9utrx_n7_cPwCj7XU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3daxNBEB9qFbQPVttKY6vug0_itZvs3O7tY60JFZNQSIS-LbcfB4GaSHMtLf7z7txtQkAp-HR3sHcszMzNb3Y-fgAfnVWVjDA2q0IRMvSaZ1YEnjkUzvOytKhbsgk1HhdXV_pyCz6ve2FCCE3xWTih2yaX7xfulo7KThUlqaR6Ak9zxB5vu7XWOQPMG7LAGF_kmY6GmHKYXa5Pp5dfJ1TGpU56UsucmlE2vFBDqxIvi2hUf_2SGz8z2P2_Hb6ClwlPsrNWAV7DVpjvwe6Kq4El092DnY3Bg_vwmyjQqBGdlXPPJrOfDYlXfByl-kI2KGdUsM5G5ZIO01m_6bHyzD6wIbk_RnFrNIMHFlEvmxDynl1nX6JP9OxsIynOypr172s6hYybid5oeQA_Bv3p-UWWaBgyJ6SoM-kEaq5dKHwMlno-qKIMGEIVQo5WBoz4vEKsLHfSCs6djZhReZrb7J3NK_EGtueLeTgE5jV6KZzyiAKt5ZZH9Kl6pfAux8q5DvCVWIxLM8qJKuPaNLEK14YkaUiSJkmyA5_Wr_xqB3Q8tnifZLZemMTVgSOSvYmggybnOioxcrXpilwW2O3A8UolTDLwpaFIDmXEdsXbf3_zAzy_mI6GZvht_P0IXtAu2hrAY9iub27DO3jm7urZ8uZ9o8V_AHvx8Lw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Modeling+and+Simulating+Multiple+Failure+Masking+Enabled+by+Local+Recovery+for+Stencil-Based+Applications+at+Extreme+Scales&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Gamell%2C+Marc&rft.au=Teranishi%2C+Keita&rft.au=Mayo%2C+Jackson&rft.au=Kolla%2C+Hemanth&rft.date=2017-10-01&rft.issn=1045-9219&rft.volume=28&rft.issue=10&rft.spage=2881&rft.epage=2895&rft_id=info:doi/10.1109%2FTPDS.2017.2696538&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPDS_2017_2696538
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon