Abstract: Any lack of computational, communication and storage resources can compromise the execution of an application or even interrupt it. This fact motivates the constant search for computational solutions (algorithms, architectures, protocols and infrastructures) with high availability.

 

Nowadays, data processing centers have exploited the advantages introduced by virtualization of computing resources, such as server consolidation, energy savings, and easier management. In this context, a user executes their application on a set of virtual machines (VMs), that is, the introduction of an abstraction between the physical hardware (hosting VM) and the final application takes place. Recently, in the wake of the resurgence of virtualization, Remus emerged, a mechanism for replication of virtual machines based on the Xen hypervisor. Remus encapsulates the user's application in an VM and performs asynchronous checkpoints between the primary host and the backup, with a frequency in the order of dozens of checkpoints (cps) per second. When a problem occurs in the main host, the VM in the secondary host is automatically started, which induces a low downtime. This project aims to characterize the overhead imposed by the Remus replication mechanism during the execution of different types of applications (distributed and without external communication).