Instability in Geo-Distributed Kubernetes Federation: Causes and Mitigation

Saved in:
Bibliographic Details
Title: Instability in Geo-Distributed Kubernetes Federation: Causes and Mitigation
Authors: Tamiru, Mulugeta, Ayalew, Pierre, Guillaume, Tordsson, Johan, Elmroth, Erik
Contributors: Elastisys AB, Design and Implementation of Autonomous Distributed Systems (MYRIADS), Centre Inria de l'Université de Rennes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SYSTÈMES LARGE ÉCHELLE (IRISA-D1), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT), This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 765452. The information and views set out in this publication are those of the author(s) and do not necessarilyreflect the official opinion of the European Union. Neither the European Union institutions and bodies nor any person acting on their behalf maybe held responsible for the use which may be made of the information contained therein. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr), European Project: 765452,h2020,H2020-MSCA-ITN-2017,FogGuru(2017)
Source: MASCOTS 2020 - 28th IEEE Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems ; https://inria.hal.science/hal-02934475 ; MASCOTS 2020 - 28th IEEE Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, Nov 2020, Nice, France
Publisher Information: CCSD
Publication Year: 2020
Subject Terms: Self-configuration, Self-adaptation, Kubernetes Federation, Fog Computing, Automatic configuration tuning, [INFO.INFO-OS]Computer Science [cs]/Operating Systems [cs.OS], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
Subject Geographic: Nice, France
Time: Nice, France
Description: International audience ; As resources in geo-distributed environments are typically located in remote sites characterized by high latency and intermittent network connectivity, delays and transient network failures are common between the management layer and the remote resources. In this paper, we show that delays and transient network failures coupled with static configuration, including the default configuration parameter values, can lead to instability of application deployments in Kubernetes Federation, making applications unavailable for long periods of time. Leveraging on the benefits of configuration tuning, we propose a feedback controller to dynamically adjust the concerned configuration parameter to improve the stability of application deployments without slowing down the detection of hard failures. We show the effectiveness of our approach in a geo-distributed setup across five sites of Grid'5000, bringing system stability from 83-92% with no controller to 99.5-100% using the controller.
Document Type: conference object
Language: English
Relation: info:eu-repo/grantAgreement//765452/EU/FogGuru: Training the Next Generation of European Fog Computing Experts/FogGuru
Availability: https://inria.hal.science/hal-02934475
https://inria.hal.science/hal-02934475v1/document
https://inria.hal.science/hal-02934475v1/file/main.pdf
Rights: http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.C559B9A3
Database: BASE
Description
Abstract:International audience ; As resources in geo-distributed environments are typically located in remote sites characterized by high latency and intermittent network connectivity, delays and transient network failures are common between the management layer and the remote resources. In this paper, we show that delays and transient network failures coupled with static configuration, including the default configuration parameter values, can lead to instability of application deployments in Kubernetes Federation, making applications unavailable for long periods of time. Leveraging on the benefits of configuration tuning, we propose a feedback controller to dynamically adjust the concerned configuration parameter to improve the stability of application deployments without slowing down the detection of hard failures. We show the effectiveness of our approach in a geo-distributed setup across five sites of Grid'5000, bringing system stability from 83-92% with no controller to 99.5-100% using the controller.