Whitepaper: Trilio Site Recovery (TSR) — DR for Kubernetes-native VMs

The combination of KVM, QEMU, and Libvirt is a widely used open-source virtualization stack in Linux environments:

  • Kernel-Based Virtual Machine (KVM) is a Linux kernel module that converts the kernel into a hypervisor, allowing direct hardware access. 
  • Quick Emulator (QEMU) is a user-space emulator that handles hardware emulation (disk, network) and utilizes KVM to accelerate performance. 
  • Libvirt is a management toolkit library that provides command line tools (virsh) and integrates with GUI tools like virt-manager.

In live production environments, protecting virtual machines is essential for data integrity, disaster recovery, and business continuity. KVM’s backup capabilities have been evolving to meet modern demands for minimal downtime, improved data consistency, and more efficient protection in production environments.

In this article, we discuss different methods and best practices for backing up KVM machines. These include simple offline backups involving downtime and the more modern method of live backups to meet the requirements of critical workloads where downtime is unacceptable.

Summary of best practices for KVM backup

Best practiceDescription 
Perform live backup with native APIsWork with the latest QEMY/Libvirt versions that support live VM backups utilizing built-in APIs.
Back up the VM configuration XMLBacking up the VM XML configuration and the VM disk image is essential for recovery.
Verify backup integrityKeep a checksum of the backup files and verify VM image consistency using the qemu-img tool.
Maintain multiple copies on different mediaUtilize the 3-2-1 strategy: 3 backups on 2 different media with 1 offsite.
Secure backups with encryptionProtect backups from unauthorized access by encrypting the image files.

Automated Red Hat OpenShift Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

KVM backup methods

A key challenge in VM backups is data consistency. Modern operating systems use memory buffers to hold modified data temporarily before writing it to disk in batches for performance reasons. If a backup or snapshot of the disk image is captured while some buffered writes remain in memory, the backup may miss recent changes. Such a backup can result in filesystem corruption or application-level inconsistencies (e.g., half-written database transactions).

There are multiple approaches to achieving filesystem-consistent or application-consistent backups for KVM-based VMs. 

Offline backups

The simplest and most reliable method is performing an offline backup by shutting down the VM. A graceful machine shutdown flushes all dirty buffers to disk. 

The downside is downtime while the machine is in shutdown mode and the disk image is being backed up to another medium. This is obviously not desirable for critical production workloads.

Legacy snapshot backup

The second method is to take a snapshot of the running VM. With older versions of libvirt (before 7.2.0) and QEMU (before 4.2), you have the option to make live backups using snapshots. 

This method creates temporary overlay files to capture changes while the VM runs. The process involves creating a snapshot of the VM, copying the image data, merging changes from the overlay files, and cleaning up temporary data. It requires a QEMU guest agent installed in the VM, which manages the data consistency during the backup/snapshot operations.

Live backups with native APIs

Newer versions of libvirt (7.2.0+) and QEMU (4.2+) provide more efficient live, full-disk backup via the built-in backup API. These backups use changed block tracking (bitmaps) and network block device (NBD) mechanisms to maintain consistency without traditional snapshots.

Backups can be done via two modes: push mode, where libvirt/QEMU writes data directly to the target; and pull mode, which exposes data via NBD for third-party tools to fetch. Both modes support live operation with minimal impact and are recommended for current deployments

Learn about the features that power Trilio’s intelligent backup and restore

Best practices for KVM backups

For the modern virtualized environments, the following is the recommended list of best practices to ensure consistent, reliable, and efficient backups:

  • Perform live backup with native APIs: Use the latest QEMU/Libvirt versions that support live VM backups utilizing built-in APIs. This allows you to create consistent, point-in-time backups of running virtual machines without any downtime or performance impact.
  • Back up the VM configuration XML: Always export and back up the complete VM configuration in XML format in addition to the VM disk images. The XML file contains critical details, including CPU configuration, memory allocation, network interfaces, storage paths, and device settings, all of which are essential for successful restoration.
  • Verify backup integrity: Generate a checksum of all the backup files and store them separately. Second, use the qemu-img tool after each backup to detect any errors or inconsistencies in the backup image files. These steps ensure that your backups are reliable and can be used for recovery when needed.
  • Maintain multiple copies on different media: Follow the industry best practice of the 3-2-1 backup rule. You should keep at least 3 copies of your data, stored on 2 different media types, with 1 copy stored offsite or in the cloud. This strategy protects against hardware failure, ransomware, accidental deletion, and site disasters.
  • Secure backups with encryption: If the VM image contains sensitive data or configurations (clear-text credentials, API keys), you should protect backups from unauthorized access by encrypting the image files.

Offline backup steps

For maximum reliability and when downtime is acceptable, use the offline backup method:

1. List the current virtual machines on the KVM host.

$ virsh list 
 Id   Name         State
----------------------------
 1    webserver1   running
 2    webserver2   running

2. Locate the disk image for the VM.

$ virsh domblklist webserver1
 Target   Source
---------------------------------------------------------
 vda      /var/lib/libvirt/images/webserver1.img

3. Gracefully shut down the VM.

$ virsh shutdown webserver1
Domain 'webserver1' is being shutdown

4. Back up the VM XML configuration, which defines the CPU, memory, disk, etc, for the VM. This configuration is required during the VM restore process.

$ virsh dumpxml webserver1 > /backup/webserver1.xml

5. Back up / copy the disk image to another location.

$ cp /var/lib/libvirt/images/webserver1.img /backup

6. Restart the VM.

$ virsh start webserver1
Domain 'webserver1' started

Learn how Trilio’s partnership with Canonical helps better protect your data

Legacy snapshot backup steps

If you have legacy versions of libvirt (below 7.2.0) and QEMU ( below 4.2), and downtime is unacceptable, the snapshot method can be used to perform KVM backups. Follow these steps:

1. Install the QEMU guest agent inside the VM, which allows the KVM host to request the guest OS to flush dirty buffers and temporarily freeze the filesystem. The agent is available for both Linux and Windows OS.

$ sudo apt install qemu-guest-agent

2. Create a live external snapshot on the KVM host.

$ virsh snapshot-create-as --domain webserver1 backup-snap \
--diskspec vda,file=/var/lib/libvirt/images/webserver1snap1.qcow2 \
--disk-only --atomic --quiesce

3. Back up the VM XML configuration.

$ virsh dumpxml webserver1 > /backup/webserver1.xml

4. Copy the original disk image to another location.

$ cp /var/lib/libvirt/images/webserver1.img /backup

5. Merge changes back to the original disk.

$ virsh blockcommit webserver1 vda --active --verbose --pivot
Block commit: [100.00 %]
Successfully pivoted

6. Clean up the temporary overlay file

$ virsh snapshot-delete webserver1 backup-snap --metadata
Domain snapshot backup-snap deleted

Live backup with native API

The following are recommended steps for robust, secure, and consistent KVM backups in modern virtualized environments.
1. Start the VM’s backup using the built-in backup API.

$ virsh backup-begin webserver1
Backup started

2. The backup process can take time for large VMs. You can monitor the backup process with the following command. Note that the output of Job type “None” indicates that the backup is complete.

$ virsh domjobinfo webserver1
Job type:         None

3. Check the backup statistics after the job completes.

$ virsh domjobinfo webserver1 --completed
Job type:         Completed   
Operation:        Backup      
Time elapsed:     11039        ms
File processed:   3.500 GiB
File remaining:   0.000 B
File total:       3.500 GiB

4. The backup creates a new disk image file in the folder /var/lib/libvirt/images/ with timestamp <vm-name>.img.<timestamp>. Check the image consistency of the backup file with qemu-img.

$ sudo qemu-img check /var/lib/libvirt/images/webserver1.img.1771064305
No errors were found on the image.
36559/57344 = 63.75% allocated, 50.14% fragmented, 0.00% compressed clusters
Image end offset: 2396717056

5. Back up the VM XML configuration.

$ virsh dumpxml webserver1 > /backup/webserver1.xml

6. Generate a checksum for the image file. This checksum can be verified when the file is copied to the backup location.

$ sudo sha256sum webserver1.img.1771064305
ffee6b12895f3b…………40a068f949f926df9da2  webserver1.img.1771064305

7. The disk image may contain sensitive data and should be protected against unauthorized access, especially for off-site storage. We can use GPG, which is an open-source command-line tool for secure data encryption. The command will prompt for an encryption passphrase. We are using symmetric encryption here, and the same passphrase will be required to decrypt the image file.

$ sudo gpg --symmetric --cipher-algo AES256 \
--output webserver1.img.gpg webserver1.img.1771064305

8. Follow the 3-2-1 strategy to protect against the possibility of data corruption or damage to the storage medium.

$ rsync -avh webserver1.img.gpg remote-server:/backup/location/

VM restoration process

When it is necessary to recover the VM from backup, follow these steps:

1. Restore and decrypt backup files. The gpg command will prompt for the passphrase used to encrypt the file.

$ rsync -avh remote-server:/backup/location/ webserver1.img.gpg

$ sudo gpg --output webserver1.img --decrypt webserver1.img.gpg
gpg: AES256.CFB encrypted data
gpg: encrypted with 1 passphrase

2. We will use the VM XML file to recreate the VM. We need to ensure that the VM image location matches the source file specified in the VM XML configuration. The following is an example snippet of the device configuration from the XML file.

  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/webserver1.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>

3. Define the VM using the XML file.

$ virsh define webserver1.xml 
Domain 'webserver1' defined from webserver1.xml

4. Start the machine.

$ virsh start webserver1
Domain 'webserver1' started

5. The last step is to log into the machine and verify that the applications and services are running as expected without any errors.

Find out how Vericast solved K8s backup and recovery with Trilio

Last thoughts

When running mission-critical applications or services, it is imperative to implement a robust backup strategy. In production environments, you have to safeguard against data corruption, hardware failures, and ransomware attacks. In most regulated industries, such as healthcare and fintech, there are compliance requirements (e.g., HIPAA, GDPR) for auditable, off-site, encrypted backups. 

By following best practices for modern technologies, including live protection, backup verification, encryption, and the 3-2-1 strategy, you can meet the service SLAs and data protection requirements of today’s demanding environments.

Table Of Contents

Like This Article?

Subscribe to our LinkedIn Newsletter to receive more educational content

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.