A multiprocessor scheduling algorithm for low overhead fault-tolerance

We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. App...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - Symposium on Reliable Distributed Systems pp. 186 - 194
Main Authors: Hashimoto, K., Tsuchiya, T., Kikuno, T.
Format: Conference Proceeding Journal Article
Language:English
Published: IEEE 01.01.1998
Subjects:
ISBN:0818692189, 9780818692185
ISSN:1060-9857
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. Applying the proposed algorithm to three kinds of practical task graphs (Gaussian elimination, Laplace equation solver and LU decomposition), we conduct simulations. Experimental results show that fault tolerance can be achieved at the cost of a small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
AbstractList We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. Applying the proposed algorithm to three kinds of practical task graphs (Gaussian elimination, Laplace equation solver and LU decomposition), we conduct simulations. Experimental results show that fault tolerance can be achieved at the cost of a small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
In this paper, we propose a new scheduling algorithm for achieving fault-tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. Applying the proposed algorithm to three kinds of practical task graphs (Gaussian elimination, Laplace equation solver and LU-decomposition), we conduct simulations. Experimental results show that fault-tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
Author Tsuchiya, T.
Hashimoto, K.
Kikuno, T.
Author_xml – sequence: 1
  givenname: K.
  surname: Hashimoto
  fullname: Hashimoto, K.
  organization: Dept. of Inf. & Math. Sci., Osaka Univ., Japan
– sequence: 2
  givenname: T.
  surname: Tsuchiya
  fullname: Tsuchiya, T.
– sequence: 3
  givenname: T.
  surname: Kikuno
  fullname: Kikuno, T.
BookMark eNotUE1Lw0AUXLCCbfUP9JSTt9S3-djNO5baaiEgaO9hs3nbrmyyNZso_nsD9TQD88EwCzbrfEeMrTisOQd8et-Vz4ePNUcs1jKDDNMbtoCCFwITXuCMzTkIiLHI5R1bhPAJkEBayDnbb6J2dIO99F5TCL6Pgj5TMzrbnSLlTr63w7mNzCQ4_xP5b-rPpJrIqCkVD95RrzpN9-zWKBfo4R-X7LjfHbevcfn2cthuytgmAodYyjqvsda8wbxAQAIpoDZZTigIjJxoInSDqeFaC9XUDWVS1YpLyMGYdMker7XT3K-RwlC1NmhyTnXkx1AlQqQoMZ-Mq6vRElF16W2r-t_qek36B5PvW-Y
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/RELDIS.1998.740493
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 194
ExternalDocumentID 740493
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-i269t-77b5b9bc1d958909e0760bf45e96e0f7bf426cd93f1cc6adbde47aba17050ff3
IEDL.DBID RIE
ISBN 0818692189
9780818692185
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000078318600023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1060-9857
IngestDate Fri Sep 05 14:32:37 EDT 2025
Tue Aug 26 17:50:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-77b5b9bc1d958909e0760bf45e96e0f7bf426cd93f1cc6adbde47aba17050ff3
Notes SourceType-Scholarly Journals-2
ObjectType-Feature-2
ObjectType-Conference Paper-1
content type line 23
SourceType-Conference Papers & Proceedings-1
ObjectType-Article-3
PQID 26639795
PQPubID 23500
PageCount 9
ParticipantIDs proquest_miscellaneous_26639795
ieee_primary_740493
PublicationCentury 1900
PublicationDate 1998-01-01
PublicationDateYYYYMMDD 1998-01-01
PublicationDate_xml – month: 01
  year: 1998
  text: 1998-01-01
  day: 01
PublicationDecade 1990
PublicationTitle Proceedings - Symposium on Reliable Distributed Systems
PublicationTitleAbbrev RELDIS
PublicationYear 1998
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020387
ssj0000507064
Score 1.4613177
Snippet We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of...
In this paper, we propose a new scheduling algorithm for achieving fault-tolerance in multiprocessor systems. The new algorithm partitions a parallel program...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 186
SubjectTerms Fault tolerance
Laplace equations
Processor scheduling
Scheduling algorithm
Title A multiprocessor scheduling algorithm for low overhead fault-tolerance
URI https://ieeexplore.ieee.org/document/740493
https://www.proquest.com/docview/26639795
WOSCitedRecordID wos000078318600023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcAElCLKpwfWtE4b2_GIoBVIVVVBh26RHdtQqSQoTeHv43OSMsDC5gxWopPje77n9w6hWyosc7iDBJSEMojCIQtklNpA8ZRr5gAEj7xQeMpns3i5FPPaZ9trYYwx_vKZ6cPQc_k6T7dQKhvwyOHZUQu1OGeVVGtXTiEO1xBA_vVZC1hZT3QyEoiYcm_9CO2XXEoTtfFO80wbMQ0Rg-fx9OHpBTR8cb96Xd125dde7RPQ5PBfn36Euj9CPjzfpahjtGeyDjpsOjng-sc-QZM7XN0srGQDeYHdodclIdCqY7l-zYtV-faOHcDF6_wLw7VPt4lrbKWbFZT52kB_DtNFi8l4cf8Y1B0WgtWQidJBa0WVUGmoBY0FEQZ4OmUjagQzxHI3HLJUi5EN05RJrbSJuFQSPHiItaNT1M7yzJwhDMDFEKIpWHYRq0SkdMikpJG1KlS6hzoQlOSj8tBIqnj00E0T1MQta-AqZGby7SZxuAEYR3r-57wLdFBpA6EUconaZbE1V2g__SxXm-Lar4xvdiiz9Q
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4UTfSEokb8RQ9eBx20HT0ahUCchCgHbku7tkqCm4Gh_759-4EHvXjrDs2Wl67v6_v6fQ-hWyYsd7iDeIz40qN-l3uSxtZTQRxo7gBEQHOhcBhMJv35XExLn-1cC2OMyS-fmTYMcy5fp_EGSmWdgDo829tFe4zSLinEWtuCCnHIhgD2L09bwMvmVCcnnuizIDd_hAZMLqmJ0nqnemaVnIaIzvMgfBi_gIqv3y5eWDZe-bVb5yloWP_Xxx-h0x8pH55uk9Qx2jFJA9WrXg64_LVP0PAOF3cLC-FAusLu2OvSEKjVsVy-pqtF9vaOHcTFy_QLw8VPt41rbKWb5WXp0kCHDnOKZsPB7H7klT0WvEWXi8yBa8WUULGvBesLIgwwdcpSZgQ3xAZu2OWxFj3rxzGXWmlDA6kkuPAQa3tnqJakiTlHGKCLIUQzMO0iVgmqtM-lZNRa5SvdRA0ISvRRuGhERTyaqFUFNXILG9gKmZh0s44ccgDOkV38Oa-FDkazpzAKx5PHS3RYKAWhMHKFatlqY67RfvyZLdarm3yVfAMByLc8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+Seventeenth+IEEE+Symposium+on+Reliable+Distributed+Systems+%28Cat.+No.98CB36281%29&rft.atitle=A+multiprocessor+scheduling+algorithm+for+low+overhead+fault-tolerance&rft.au=Hashimoto%2C+K.&rft.au=Tsuchiya%2C+T.&rft.au=Kikuno%2C+T.&rft.date=1998-01-01&rft.pub=IEEE&rft.isbn=9780818692185&rft.issn=1060-9857&rft.spage=186&rft.epage=194&rft_id=info:doi/10.1109%2FRELDIS.1998.740493&rft.externalDocID=740493
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1060-9857&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1060-9857&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1060-9857&client=summon