High-availability (HA) systems—essential in many contemporary contexts—are designed to guarantee the availability of processes and data for more than 99% of their operational time. These systems are typically implemented as Cloud/Edge infrastructures that are properly maintained by human operators and intelligent agents in order to guarantee the required level of availability. Moreover, we are witnessing the widespread adoption of AI-based automation across many industries. AI-based software agents are increasingly being adopted to introduce more automation in highly available systems, particularly for monitoring and fault detection, fault prediction, recovery, and optimization processes. In this review paper, we discuss the state of the art of AI-based solutions for HA systems. In particular, we focus on the use of AI for the core operational mechanisms of monitoring, failure detection, and recovery. Our discussion begins by reviewing a few key background concepts of HA architectures, then we review recent work on AI-based solutions for monitoring, fault detection and recovery in HA systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lidia Fotia
Rosario Gaeta
Fabrizio Messina
Computers
University of Milano-Bicocca
University of Catania
University of Salerno
Building similarity graph...
Analyzing shared references across papers
Loading...
Fotia et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d896406c1944d70ce0785d — DOI: https://doi.org/10.3390/computers15040231