A multiprocessor scheduling algorithm for low overhead fault-tolerance

We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. App...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - Symposium on Reliable Distributed Systems s. 186 - 194
Hlavní autoři: Hashimoto, K., Tsuchiya, T., Kikuno, T.
Médium: Konferenční příspěvek Journal Article
Jazyk:angličtina
Vydáno: IEEE 01.01.1998
Témata:
ISBN:0818692189, 9780818692185
ISSN:1060-9857
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. Applying the proposed algorithm to three kinds of practical task graphs (Gaussian elimination, Laplace equation solver and LU decomposition), we conduct simulations. Experimental results show that fault tolerance can be achieved at the cost of a small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
AbstractList We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. Applying the proposed algorithm to three kinds of practical task graphs (Gaussian elimination, Laplace equation solver and LU decomposition), we conduct simulations. Experimental results show that fault tolerance can be achieved at the cost of a small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
In this paper, we propose a new scheduling algorithm for achieving fault-tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of tasks based on some characteristics of a task graph. Then for each subset, the algorithm duplicates and schedules its tasks successively. Applying the proposed algorithm to three kinds of practical task graphs (Gaussian elimination, Laplace equation solver and LU-decomposition), we conduct simulations. Experimental results show that fault-tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
Author Tsuchiya, T.
Hashimoto, K.
Kikuno, T.
Author_xml – sequence: 1
  givenname: K.
  surname: Hashimoto
  fullname: Hashimoto, K.
  organization: Dept. of Inf. & Math. Sci., Osaka Univ., Japan
– sequence: 2
  givenname: T.
  surname: Tsuchiya
  fullname: Tsuchiya, T.
– sequence: 3
  givenname: T.
  surname: Kikuno
  fullname: Kikuno, T.
BookMark eNotUE1Lw0AUXLCCbfUP9JSTt9S3-djNO5baaiEgaO9hs3nbrmyyNZso_nsD9TQD88EwCzbrfEeMrTisOQd8et-Vz4ePNUcs1jKDDNMbtoCCFwITXuCMzTkIiLHI5R1bhPAJkEBayDnbb6J2dIO99F5TCL6Pgj5TMzrbnSLlTr63w7mNzCQ4_xP5b-rPpJrIqCkVD95RrzpN9-zWKBfo4R-X7LjfHbevcfn2cthuytgmAodYyjqvsda8wbxAQAIpoDZZTigIjJxoInSDqeFaC9XUDWVS1YpLyMGYdMker7XT3K-RwlC1NmhyTnXkx1AlQqQoMZ-Mq6vRElF16W2r-t_qek36B5PvW-Y
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/RELDIS.1998.740493
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 194
ExternalDocumentID 740493
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-i269t-77b5b9bc1d958909e0760bf45e96e0f7bf426cd93f1cc6adbde47aba17050ff3
IEDL.DBID RIE
ISBN 0818692189
9780818692185
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000078318600023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1060-9857
IngestDate Fri Sep 05 14:32:37 EDT 2025
Tue Aug 26 17:50:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-77b5b9bc1d958909e0760bf45e96e0f7bf426cd93f1cc6adbde47aba17050ff3
Notes SourceType-Scholarly Journals-2
ObjectType-Feature-2
ObjectType-Conference Paper-1
content type line 23
SourceType-Conference Papers & Proceedings-1
ObjectType-Article-3
PQID 26639795
PQPubID 23500
PageCount 9
ParticipantIDs proquest_miscellaneous_26639795
ieee_primary_740493
PublicationCentury 1900
PublicationDate 1998-01-01
PublicationDateYYYYMMDD 1998-01-01
PublicationDate_xml – month: 01
  year: 1998
  text: 1998-01-01
  day: 01
PublicationDecade 1990
PublicationTitle Proceedings - Symposium on Reliable Distributed Systems
PublicationTitleAbbrev RELDIS
PublicationYear 1998
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020387
ssj0000507064
Score 1.4613771
Snippet We propose a new scheduling algorithm for achieving fault tolerance in multiprocessor systems. The new algorithm partitions a parallel program into subsets of...
In this paper, we propose a new scheduling algorithm for achieving fault-tolerance in multiprocessor systems. The new algorithm partitions a parallel program...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 186
SubjectTerms Fault tolerance
Laplace equations
Processor scheduling
Scheduling algorithm
Title A multiprocessor scheduling algorithm for low overhead fault-tolerance
URI https://ieeexplore.ieee.org/document/740493
https://www.proquest.com/docview/26639795
WOSCitedRecordID wos000078318600023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcBUKEWUTw-sad0ktuMRQSuQqqqCCnWL7PgMlUqC0hT-PnY-ygALyuIMlqOT43u-u_cOoZuAaqV8kJ5SWnqhSZgnIgg8EmkJvn1YWYz5MuWzWbRcinmts11yYQCgLD6DgRuWuXydJVsXKhvy0OLZoIVanLOKqrULpxCLa4hD_vVdy2Vly0QnI3ZxykvpR9d-ybo0UQvvNO-0IdMQMXwaT-8fnx2HLxpUy9VtV36d1aUDmnT-9emHqPdD5MPznYs6QnuQdlGn6eSA6x_7GE1ucVVZWNEGshzbS691Qo6rjuX6NctXxds7tgAXr7Mv7Mo-7SGusZF2lldka3D9OaCHFpPx4u7BqzsseCuficJCa0WVUMlICxoJIsDl6ZQJKQgGxHA79FmiRWBGScKkVhpCLpV0GjzEmOAEtdMshVOE_UBRKZWRIQRhpE0UmhFTXCbCSQ4K3kddZ5T4o9LQiCt79NF1Y9TYbmuXq5ApZNtNbHGDyzjSsz_nnaODihvoQiEXqF3kW7hE-8lnsdrkV-XO-AYVSLV4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4UTfSEIkb8RQ9eB2Vrt_VoFAIRCVFiuC3t-qokuJkx9N-33QYe9GJ26Q5Nl5eu7-t77_seQjceU1K6IBwplXCojn2Hh-A5JFQCXPP4RTHmyziYTML5nE8rne2CCwMARfEZdOywyOWrNF7bUFk3oAbPertoj1HqkpKstQ2oEINsiMX-1W3L5mWLVKdPzPIsKMQfbQMm49R4Jb2zeWcbOg3h3af--H70bFl8YadcsGq88uu0LlzQoP6vjz9CzR8qH55undQx2oGkgeqbXg64-rVP0OAWl7WFJXEgzbC59ho3ZNnqWCxf02yRv71jA3HxMv3CtvDTHOMKa2FmOXm6BNuhA5poNujP7oZO1WPBWbg-zw24lkxyGfcUZyEnHGymTmrKgPtAdGCGrh8r7uleHPtCSQU0EFJYFR6itXeKakmawBnCrieZEFILCh4NlQ6p7vkyEDG3ooM8aKGGNUr0UapoRKU9Wqi9MWpkNrbNVogE0vUqMsjB5hzZ-Z_z2uhgOHscR-PR5OECHZZMQRsYuUS1PFvDFdqPP_PFKrsudsk3tc24vw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+Seventeenth+IEEE+Symposium+on+Reliable+Distributed+Systems+%28Cat.+No.98CB36281%29&rft.atitle=A+multiprocessor+scheduling+algorithm+for+low+overhead+fault-tolerance&rft.au=Hashimoto%2C+K.&rft.au=Tsuchiya%2C+T.&rft.au=Kikuno%2C+T.&rft.date=1998-01-01&rft.pub=IEEE&rft.isbn=9780818692185&rft.issn=1060-9857&rft.spage=186&rft.epage=194&rft_id=info:doi/10.1109%2FRELDIS.1998.740493&rft.externalDocID=740493
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1060-9857&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1060-9857&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1060-9857&client=summon