Managing virtual machines on OpenShift Virtualization puts the administrator between two paradigms: traditional virtual machines and modern container orchestration platforms. When adopting this platform, you should carefully consider your backup and data protection strategy. Backing up a virtual machine on OpenShift Virtualization is very different from right-clicking a VM in VMware vSphere to create a snapshot. Instead, treat your VM as a Kubernetes application composed of different objects and back up those objects to ensure a proper disaster recovery strategy.
This article explores the practical aspects of protecting virtual machines running on OpenShift Virtualization environments. As you read on, you will move from understanding key backup components and patterns to recognizing operational challenges and how dedicated backup platforms can support scale in administration.
Summary of key considerations for OpenShift Virtualization backup
Aspect | Consideration | Recommendation |
Application consistency | Crash-consistent snapshots risk corrupting transactional data within the guest OS. | Install the QEMU guest agent to coordinate fsfreeze and fsthaw operations during the CSI snapshot. |
Disaster recovery and storage class mapping | Secondary clusters often utilize different storage backends, preventing a direct 1:1 restore. | Use backup tools to dynamically rewrite YAML configurations and translate storage classes. |
Restore validation | The lack of verified restore workflows creates significant risk during a total cluster failure. | Leverage the OpenShift declarative API to test VMI readiness in isolated namespaces regularly. |
Automated Application-Centric Red Hat OpenShift Data Protection & Intelligent Recovery
What does “backup” mean in OpenShift Virtualization?
On a traditional virtualization platform, a backup usually consists of a snapshot of the virtual machine’s disks and a copy of the hardware configuration stored in a proprietary file. In OpenShift Virtualization, a virtual machine combines multiple elements, including some Kubernetes resource definitions. If you want to back up a VM on this platform, you should consider its running state, the Kubernetes custom resources that define the hardware profile, and the persistent storage used for the operating system and user data. If you only back up the storage volume, you end up with a disk without a hardware definition, which would probably require manual configuration to make it bootable. And if you just back up the Kubernetes virtual machine definition, you end up with a hardware profile with no data to boot the VM.
To successfully back up a virtual machine in this environment, you need a backup tool that treats these resources as a whole and makes a copy of both in sync.
For storage operations, OpenShift Virtualization includes a component, the Containerized Data Importer, that maps virtual machine disks to persistent volume claims (PVCs) via a custom resource, DataVolume. OpenShift Virtualization uses the Container Storage Interface (CSI) to manage persistent storage, which allows you to create a CSI Snapshot of the virtual machine and pair it with a copy of the VirtualMachine YAML definition to serve as the basis for an initial backup strategy.
The table below compares traditional VM backup and OpenShift Virtualization backup.
| Feature | Traditional VM backup | OpenShift Virtualization backup |
| Platform | Traditional virtualization | OpenShift Virtualization |
| Storage mechanism | Snapshot VM from hypervisor | PVC Snapshot via CSI |
| Configuration | Hardware config stored in proprietary file | Hardware defined in VirtualMachine YAML |
| Integration point | Backup tool integrated with the hypervisor | Backup tool integrates with the Kubernetes API |
| Restoration | Restore VM image | Restore Kubernetes resources + PVC |
| Dependency | Storage replication is often sufficient | Requires storage + YAML state |
Key elements involved in OpenShift Virtualization backups
Creating a backup strategy for OpenShift Virtualization requires understanding what to protect. An OpenShift VM is a collection of Kubernetes resources working together, not a single object.
VirtualMachine and VirtualMachineInstance
The VirtualMachine custom resource is your virtual machine template. It configures the machine definition: how much CPU and memory it has and what disks are attached to it. It could also include other virtual hardware presented to the VM for its use. When you power on the virtual machine, OpenShift Virtualization generates a VirtualMachineInstance object that represents the running VM. A proper backup saves the VirtualMachine resource, allowing you to restore the machine configuration to its original state. The VirtualMachineInstance is ephemeral and is automatically regenerated from the VirtualMachine definition when the VM starts. You should not back it up directly, as it represents runtime state rather than declarative configuration.
DataVolumes and PersistentVolumeClaims
OpenShift Virtualization manages storage operations using the Containerized Data Importer. It maps DataVolume objects, an abstraction that handles disk image import, to a PersistentVolumeClaim.
- ConfigMaps and Secrets: You can use them to pass cloud-init scripts, SSH keys, and other secrets to a running virtual machine.
- NetworkAttachmentDefinitions: These allow your VM to connect to external VLANs outside of the software-defined network in the cluster.
- Services: These could expose virtual machine ports, such as SSH or RDP, to external users.
Look at the YAML configuration example below, which demonstrates how to use a DataVolume to define a VM disk. Your backup strategy should include the underlying PVC, named my-database-disk, and take a snapshot of it.
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: database-vm
namespace: production-vms
spec:
running: true
template:
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: rootdisk
volumes:
- dataVolume:
name: my-database-disk
name: rootdisk
Automated Red Hat OpenShift Data Protection & Intelligent Recovery
Perform secure application-centric backups of containers, VMs, helm & operators
Use pre-staged snapshots to instantly test, transform, and restore during recovery
Scale with fully automated policy-driven backup-and-restore workflows
Common backup patterns and limitations
As administrators migrate workloads to OpenShift Virtualization, various backup strategies are employed, some of which are legacy methods requiring reconsideration. Below, we review common patterns and explain why they often fail at scale.
In-guest backup agents
Some administrators try to use existing enterprise backup tools that install directly into the guest operating system. This agent software runs within Windows or Linux VMs, reads files, and sends them to a proprietary backup server over the network.
The problem with this approach is that it completely ignores the Kubernetes layer. The agent does not know that the VM is running inside a pod and does not back up the VirtualMachine definition or the network configuration. If you accidentally delete the VM in the console, you cannot easily restore it to a working state without reinstalling.
Storage-level replication
If you use an enterprise storage solution, like a SAN, you might consider using storage-native replication to an alternate site. This type of solution replicates block devices from your local volumes to a disaster recovery site on a schedule, or even continuously.
This architecture provides high availability in case of a disaster, but it does not replace a backup strategy. If an administrator accidentally drops a database, or if you are a victim of a ransomware attack and have all your data encrypted, the corrupted data is immediately replicated to the secondary site. Also, you need to consider the YAML definition of the virtual machines.
CSI snapshots
The OpenShift Virtualization platform uses the Container Storage Interface (CSI) to manage VM disks as PersistentVolumeClaims. You could run automated jobs to take standard volume snapshots using a VolumeSnapshot custom resource, which tells the underlying storage solution to take a snapshot of the volume.
A CSI snapshot without guest coordination results in a crash-consistent image, similar to an unexpected power loss. With QEMU guest agent integration, it can become application-consistent.
OpenShift API for Data Protection
Red Hat provides the OpenShift API for Data Protection, or OADP. This Operator is built on top of the open-source Velero project and uses specific plugins designed for OpenShift Virtualization. When you use it as a backup solution, it takes a snapshot of the VM and stores its virtual machine YAML definition.
While OADP is powerful and flexible, large-scale environments may require significant operational effort to manage backup schedules, hooks, and restore workflows consistently. If you want to freeze a database before the snapshot, you must write and maintain custom pre-execution and post-execution scripts. Also, managing Velero backup schedules solely through the command-line interface places a heavy operational burden on the platform team.
apiVersion: velero.io/v1
kind: Backup
metadata:
name: daily-vm-backup
namespace: openshift-adp
spec:
includedNamespaces:
- production-vms
csiSnapshotTimeout: 10m
includedResources:
- virtualmachines
- persistentvolumeclaims
- secrets
Automated Red Hat OpenShift Data Protection & Intelligent Recovery
Perform secure application-centric backups of containers, VMs, helm & operators
Use pre-staged snapshots to instantly test, transform, and restore during recovery
Scale with fully automated policy-driven backup-and-restore workflows
Operational considerations in production environments
Keeping reliable backups at production scale introduces additional complexity. Here are some concerns you need to plan for.
Application consistency
Having crash-consistent backups is not enough if you run a transactional database of any kind inside a virtual machine. You need application-consistent backups, so the database files are not corrupted during the snapshot process.
This approach implies the need to run a QEMU agent inside the guest operating system, so your backup tool communicates with the agent to freeze the filesystem, take a storage snapshot, and then unfreeze the filesystem so the virtual machine can resume read and write operations. The QEMU guest agent allows the backup platform to issue fsfreeze and fsthaw operations inside the VM before and after the CSI snapshot is taken.
Disaster recovery and storage class mapping
If you need to use your backups in a disaster recovery situation when your primary OpenShift cluster experiences a total failure, you have to restore those backups on a different OpenShift Virtualization cluster. The restore process cannot be fully automated because your secondary cluster may use a different StorageClass name (if it supports a different storage solution).
Your backup tool must support storage class mapping. When you start the restore process, the tool may dynamically rewrite the YAML configuration to account for the StorageClass name. For example, a PVC for an on-premises virtual machine could be backed by Ceph. It must be translated to request an AWS Elastic Block Store storage class when restoring on a new OpenShift cluster in a public cloud.
Restore validation
A backup is useless if you cannot restore it successfully. Since you can use the OpenShift declarative API, it is highly recommended that you automate your restore tests. You can restore your backups to a temporarily isolated namespace, verify that the VirtualMachineInstance reaches a running state, and then automatically delete the test namespace.
OpenShift Virtualization backup beyond basic tooling with Trilio
As you deploy more mission-critical workloads on OpenShift Virtualization, you will notice that open source tools and custom scripts are difficult to maintain and operate at scale. This means that it makes sense to consider enterprise Kubernetes-native backup platforms.
Trilio Vault for Kubernetes is an enterprise solution designed specifically to handle the complexity of OpenShift Virtualization. It solves the limitations of standard volume snapshots, legacy agents, and complex solutions managed by scripts. If you need to manage a large-scale, mission-critical virtualization cluster, there are multiple ways a platform like Trilio can fit into your infrastructure to improve your daily operations.
Native OpenShift console integration
Relying solely on the command-line interface for all your backup operations and automation creates a bottleneck for the administrator, especially when you have more than a handful of virtual machines to manage. Trilio is packaged as a Red Hat Certified Operator and integrates directly into the OpenShift dashboard.
Learn KubeVirt & OpenShift Virtualization Backup & Recovery Best Practices
Application-centric backups
Trilio not only backs up a single virtual machine but also protects an enterprise logical application. For example, if you have a legacy database running as a virtual machine that provides data to a modern, containerized microservice application, Trilio can protect the virtual machine and other Kubernetes YAML definitions (such as deployments, config maps, and secrets) that are part of the application. You can use label selectors to include objects related to the same application and also include backups of specific OpenShift Operators to ensure that the entire application stack is included in your platform’s backup.
Ransomware protection and immutability
To meet security and compliance requirements, Trilio supports immutable backups to an S3-compatible object store. This architecture ensures that ransomware cannot encrypt, modify, or delete your backups, and it can also encrypt data at rest. Since Trilio uses the standard QCOW2 format for virtual machine disks and regular JSON to save Kubernetes resource definitions, you can restore your backups in the future even if you don’t have access to the backup platform, all without being locked into a proprietary vendor format.
Zero RPO disaster recovery
For mission-critical VMs where data loss is not an option, a manual restore from backup is not enough. Trilio Site Recovery is a disaster recovery solution for OpenShift Virtualization designed to eliminate data loss during a failover. It achieves zero RPO and recovery time in under five minutes, and works with any block storage backend, so organizations do not need to lock into a specific storage vendor to get DR capabilities.
Multi-tenancy and self-service
You can have your development team execute backup and restore operations on their virtual machines using role-based access control (RBAC) policies. This way, developers can have a restricted view in the OpenShift console, allowing them to restore their own applications without requiring cluster-admin privileges.
Using Trilio custom resource definitions, the backup configuration becomes declarative, allowing you to manage your backup policies with a standard GitOps pipeline.
apiVersion: triliovault.trilio.io/v1
kind: BackupPlan
metadata:
name: production-db-vm-plan
namespace: database-workloads
spec:
backupConfig:
target:
name: aws-s3-immutable-target
namespace: trilio-system
backupComponents:
- labelSelector:
matchLabels:
app: postgres-vm
You must define a BackupPlan, point it to an immutable storage target, and select the appropriate objects to include using labels. The platform handles the orchestration, including freezing the filesystem using the QEMU guest agent, taking the CSI snapshot, and saving the metadata.
Create the PVC to bind to the existing PV:
Learn How To Best Backup & Restore Virtual Machines Running on OpenShift
Conclusion
Migrating your workloads to OpenShift Virtualization requires you to change how you handle data protection. You no longer copy files from a legacy hypervisor but need to orchestrate data protection across a dynamic Kubernetes environment.
In this article, we reviewed how OpenShift Virtualization uses multiple resources to represent a virtual machine and how you should save not only the underlying storage snapshots but also the VM definition represented as a Kubernetes object. Traditional backup patterns, such as guest agents, are insufficient to protect your virtual machines; you should keep in mind the technical importance of application-consistent backups. We showed you how specialized platforms like Trilio simplify management through user interface integration, ransomware protection, and automated disaster recovery.
Applying these practices ensures that your legacy workloads running on virtual machines achieve the same operational resilience as your modern container-native applications.
Like This Article?
Subscribe to our LinkedIn Newsletter to receive more educational content