Download PDFOpen PDF in browser

A Fault Tolerance Mechanism for Hybrid Scientific Workflows

EasyChair Preprint 13948

12 pagesDate: July 12, 2024

Abstract

In large distributed systems, failures are a daily event occurring frequently, especially with growing numbers of computation tasks and locations on which they are deployed.

The advantage of representing an application with a workflow is the possibility of exploiting Workflow Management System (WMS) features such as portability. A relevant feature that some WMSs supply is reliability.

Over recent years, the emergence of hybrid workflows has posed new and intriguing challenges by increasing the possibility of distributing computations involving heterogeneous and independent environments. Consequently, the number of possible points of failure in the execution increased, creating different important challenges that are interesting to study.

This paper presents the implementation of a fault tolerance mechanism for hybrid workflows based on the recovery and rollback approach. A representation of the hybrid workflows with the formal framework is provided, together with the experiments demonstrating the functionality of implementing approach.

Keyphrases: Workflows, fault tolerance, formal semantics, hybrid workflows

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:13948,
  author    = {Alberto Mulone and Doriana Medic and Marco Aldinucci},
  title     = {A Fault Tolerance Mechanism for Hybrid Scientific Workflows},
  howpublished = {EasyChair Preprint 13948},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser