Recovering from Distributable Thread Failures with Assured Timeliness in Real-Time Distributed Systems
We consider the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans - i.e., thread segments that are disconnected from the thread's root. We consider a termination model for recove...
Uloženo v:
| Vydáno v: | 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06) s. 267 - 276 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.10.2006
|
| Témata: | |
| ISBN: | 9780769526775, 0769526772 |
| ISSN: | 1060-9857 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | We consider the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans - i.e., thread segments that are disconnected from the thread's root. We consider a termination model for recovering from such failures, where the orphans must be detected and aborted, and failure-exception notification must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. We present a realtime scheduling algorithm called AUA, and a distributable thread integrity protocol called TP-TR. We show that AUA and TP-TR bound the orphan cleanup and recovery time, thereby bounding thread starvation durations, and maximize the total thread accrued timeliness utility. We implement AUA and TP-TR in a real-time middleware that supports distributable threads. Our experimental studies with the implementation validate the algorithm/protocol's time-bounded recovery property and confirm their effectiveness |
|---|---|
| ISBN: | 9780769526775 0769526772 |
| ISSN: | 1060-9857 |
| DOI: | 10.1109/SRDS.2006.38 |

