Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

Containers have become the de facto standard for building and scaling modern applications. However, real-life infrastructures often include legacy applications that necessitate the use of virtual machines (VMs), keeping environments from being 100% containerized.

OpenShift virtualization seamlessly combines OpenShift’s container orchestration capabilities with traditional virtualization, offering a unified platform for managing containers and virtual machines. This article describes how OpenShift virtualization works, complete with a step-by-step example and a discussion of best practices for success.

Summary of key OpenShift virtualization concepts

The following table provides an overview of the key concepts covered in this article.

Concept

Description

Building blocks of OpenShift virtualization

KubeVirt is the heart of OpenShift virtualization. The KubeVirt add-on provides an API for enabling virtual machine management within OpenShift.

Understanding containerized virtual machines

OpenShift runs virtual machines using the KVM hypervisor. In contrast to traditional virtualization, the virtual machine processes exist inside a pod in OpenShift, allowing virtual machines to be managed as native OpenShift objects.

Running and managing virtual machines in OpenShift

Virtual machines can be created using existing or custom templates through the OpenShift GUI or by writing YAML manifests. The YAML templates for virtual machines can be generated using the virtctl utility.

Best practices for running virtualization workloads in OpenShift

The recommended practices for running virtual machines in OpenShift include templating, using security-hardened images, monitoring resource usage, ensuring security, and regularly backing up virtual machines.

Building blocks of OpenShift virtualization

OpenShift virtualization brings virtual machines as native objects into the OpenShift container platform. It is based on the container-native virtualization technology developed upstream by the KubeVirt project. Managing virtual machines using OpenShift native objects offers benefits such as allowing you to use features like pipelines, GitOps, and service meshes with virtual machines.

Under the hood, OpenShift virtualization leverages the KVM hypervisor. KVM is a virtualization module in the Linux kernel that allows the kernel to function as a hypervisor. It is a mature technology that major cloud providers use as the virtualization backend for their infrastructure-as-a-service (IaaS) offerings. OpenShift Virtualization uses the KVM hypervisor to allow Kubernetes and KubeVirt to manage the virtual machines. As a result, the virtual machines use OpenShift’s scheduling, network, and storage infrastructure.

The following figure illustrates the major components of this workflow.

 Figure 1 - Key components of OpenShift virtualization

  Figure 1 – Key components of OpenShift virtualization

The interaction of these components is explained further below, with numbers in parentheses indicating where each component is on the diagram above.

User interaction (1)

The user interacts with the OpenShift cluster and defines a Virtual Machine Instance (VMI) resource. The VMI definition describes a virtual machine within KubeVirt and specifies details like the VM image, memory, CPU, storage, and networking. This VMI definition acts as a blueprint for the virtual machine.

OpenShift API (2)

The OpenShift cluster receives the VMI definition from the client, validates the input, and creates a VM custom resource definition (CRD) object.

KubeVirt

KubeVirt is the underlying framework that enables virtual machine management within OpenShift. It uses the following major components to handle the VM lifecycle.

Virt Controller (3)

The virt-controller monitors the custom resource definitions of virtual machine instances (VMIs) and performs the following tasks:

  • It monitors for new VMI definitions submitted to the OpenShift API.
  • Based on the VMI definition, it creates a regular OpenShift pod that serves as the container for the virtual machine.
  • The pod then goes through the standard OpenShift scheduling process to find a suitable node in the cluster to run on.
  • Once the pod is scheduled, the virt-controller updates the VMI definition with the assigned node name and passes control to the virt-handler daemon running on that node.

Virt Handler (4)

The virt-handler operates as a daemon set and manages virtual machines (VMs) on a specific node. Its main functions include the following:

  • Continuously monitoring for VMI objects assigned to a node.
  • When a new VMI is assigned, creating a virtual machine instance (domain) on the node using libvirt based on the VMI definition.
  • Tracking the VM’s state (running, stopped) and ensuring that it matches the desired state specified in the VMI.
  • Gracefully shutting down the corresponding virtual machine domain and cleaning up when the VMI object is deleted.

Virt launcher (5)

Each VMI object in OpenShift is associated with a single pod. This pod serves as the container for the VM process. The virt-launcher is responsible for configuring the pod’s internal resources, such as cgroups and namespaces, to create a secure and isolated space for the VM to operate in.

libvirtd (6)

The virt-launcher uses an instance of libvirtd embedded within the pod to manage the VM’s lifecycle. Libvirtd serves as a virtualization management library, offering functionalities to interact with the underlying KVM hypervisor for VM creation, configuration, and termination. The KVM, in turn, virtualizes the hardware resources based on the instructions received through libvirt.

Understanding containerized virtual machines

Some might still be confused by the term “containerization” as compared to traditional virtual machine concepts because they seem like different worlds. Here’s how OpenShift virtualization bridges the gap.

If you have experience operating a virtual machine using KVM on Linux, you’ll likely notice several qemu-kvm processes. These processes, identifiable by the VM’s name, come with extensive parameters that outline the virtual machine’s hardware specifications. This method engages directly with the host system to establish the virtual machine.

The concept remains similar when running virtual machines on OpenShift, but the process is containerized. Here’s the difference.

Traditional KVM runs qemu-kvm processes directly on the host system. OpenShift, on the other hand, creates a dedicated pod for each VM. As explained earlier, the virt-launcher manages the virtual machine process inside the pod. Much like traditional KVM virtualization, the virt-launcher utilizes libvirtd to interact with the underlying virtualization technology (like KVM) on the host.

This approach paves the way for managing virtual machines as native OpenShift objects. The scheduling system responsible for scheduling regular pod-based workloads manages virtual machines within the cluster. Like non-virtual workloads, they automatically have access to built-in OpenShift capabilities such as host affinity, resource awareness, load balancing, and high availability.

When virtual machines are integrated into the cluster’s software-defined networking (SDN), they can be accessed using standard OpenShift techniques, including services and routes. Additionally, users can apply network policies and ingress and egress settings to manage traffic flow. This configuration provides practical methods for granting internal and external access to virtual machines within the cluster while implementing essential security measures to control access to applications running on the VMs.

Learn KubeVirt & OpenShift Virtualization Backup & Recovery Best Practices

Running virtual machines in OpenShift

With the basics covered, let’s switch to the OpenShift console and see how to create virtual machines. This section will help you create and manage virtual machines within your OpenShift environment. Virtual machines can be created using the existing templates that come with the Virtualization operator, user-defined custom templates, or by entering the custom YAML definition. The demo will show you how to create a virtual machine from the OpenShift CLI.

Before proceeding, follow these instructions to install the OpenShift Virtualization operator from the Operator Hub. Be sure to pick a matching version of your OpenShift cluster version from the documentation. 

Once the Virtualization operator has been installed and configured, the Virtualization tab will become available in the console.

Virtual machines in OpenShift are assigned to a particular project. Like regular pod-based workloads, by default, users who lack the necessary permissions for that namespace cannot access, manage, or oversee the virtual machines within it.

Setting a storage class

Before proceeding, ensure that you’ve set a default storage class on your cluster. OpenShift virtualization leverages the Containerized Data Importer (CDI) to manage persistent storage for virtual machines. 

CDI creates a Persistent Volume Claim (PVC) based on the defined specifications and fetches the disk image to populate the underlying storage volume. Not specifying a default storage class means that the cluster will not provision any PVCs. 

You can check the available storage classes in your cluster as follows; the default one will have the “(default)” label next to it:

Learn How To Best Backup & Restore Virtual Machines Running on OpenShift

$ oc get sc
NAME   PROVISIONER   RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION  AGE
ocs-storagecluster-ceph-rbd (default)  openshift-storage.rbd.csi.ceph.com  Delete      Immediate   true 94d 
Ocs-storagecluster-cephfs              openshift-storage.rbd.csi.ceph.com  Delete      Immediate   true 94d
[....]

You can then set the default storage class as follows:

$ oc patch storageclass storage_class_name -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

Using the virtctl utility

You can create virtualization manifests using the virtctl utility. Once you go to the Overview section in the Virtualization menu, you should see the download link for this utility on the right.

Like the oc utility, we’ll need to set up the virtctl utility on our Bastion host. Download the file and decompress the archive on your Bastion host. Copy the virtctl binary to a directory in your PATH environment variable; here, we’ve copied it to /usr/bin/:

$ tar -xvf virtctl.tar.gz
$ chmod +x virtctl
$ echo $PATH
$ cp virtctl /usr/bin

Using virtctl, creating virtual machine manifest files is straightforward:

$ virtctl create vm --name vm-1
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  creationTimestamp: null
  name: vm-1
spec:
  runStrategy: Always
  template:
 metadata:
   creationTimestamp: null
 spec:
   domain:
     devices: {}
     memory:
       guest: 512Mi
     resources: {}
   terminationGracePeriodSeconds: 180
status: {}

But that’s not enough. When creating virtual machines, we must define details such as the operating system, disk, and compute specifications. These steps are covered in the subsections that follow.

Choosing an instance type and OS

As of version 4.15, OpenShift virtualization includes predefined instance types. Much like the cloud platforms, these instance types comprise varying combinations of CPU and memory:

Instance typeRecommended usage
CX Series (Compute Exclusive)Compute-intensive applications
U Series (Universal Series)General-purpose applications
GN Series (GPU NVIDIA)Virtual machines using NVIDIA GPUs
M SeriesMemory-intensive applications

You can also define custom instance types. When defining a custom instance type, the only mandatory parameters are CPU and RAM. You can check out all the available instance-type offerings as follows:

$ oc get vmclusterinstancetypes

To check out the specifications of a particular instance, you can use:

$ oc describe vmclusterinstancetypes cx1.medium

Let’s move on to the OS selection. Run the following to list the available data sources:

$ oc get datasources -n openshift-virtualization-os-images
NAME          AGE
centos-stream8   7h21m
centos-stream9   7h21m
centos7       7h21m
fedora        7h21m
rhel7         7h21m
rhel8         7h21m
rhel9         7h21m
win10         7h21m
win11         7h21m
win2k12r2     7h21m
win2k16       7h21m
win2k19       7h21m
win2k22       7h21m

Let’s see the details for the fedora data source:

$ oc get datasources/fedora  -n openshift-virtualization-os-images -o yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataSource
metadata:
  annotations:
 operator-sdk/primary-resource: openshift-cnv/ssp-kubevirt-hyperconverged
 operator-sdk/primary-resource-type: SSP.ssp.kubevirt.io
  creationTimestamp: "2024-05-28T11:13:13Z"
  generation: 2
  labels:
 app.kubernetes.io/component: storage
 app.kubernetes.io/managed-by: cdi-controller
 app.kubernetes.io/part-of: hyperconverged-cluster
 app.kubernetes.io/version: 4.14.5
 cdi.kubevirt.io/dataImportCron: fedora-image-cron
 instancetype.kubevirt.io/default-instancetype: u1.medium
 instancetype.kubevirt.io/default-preference: fedora
[.......]

Learn KubeVirt & OpenShift Virtualization Backup & Recovery Best Practices

The default instance type for the fedora data source is u1.medium, which specifies 1 CPU and 4 GiB of memory. Below that, the default preference is the VirtualMachineClusterPreference (VMCP) field, a custom resource that allows you to define cluster-wide preferences for deploying virtual machines. It acts as a centralized configuration point for the VM attributes you want to apply by default to all VMs created within the cluster unless overridden by individual VM specifications.

You can check the cluster-wide VM preferences as follows:

$ oc get vmcps
NAME                  AGE
alpine                7h41m
centos.7              7h41m
centos.7.desktop      7h41m
centos.stream8        7h41m
centos.stream8.desktop   7h41m
centos.stream9        7h41m
centos.stream9.desktop   7h41m
cirros                7h41m
fedora                7h41m
[........]
$ oc get vmcps fedora -o yaml
apiVersion: instancetype.kubevirt.io/v1beta1
kind: VirtualMachineClusterPreference
metadata:
  annotations:
 iconClass: icon-fedora
 [.....................]
  generation: 1
  labels:
 app.kubernetes.io/component: templating
 app.kubernetes.io/managed-by: ssp-operator
     [......................]
  name: fedora
  resourceVersion: "42107926"
  uid: b1b262d1-680d-4fbf-9829-33ad30be40fd
spec:
  devices:
 preferredDiskBus: virtio
 preferredInterfaceModel: virtio
 preferredNetworkInterfaceMultiQueue: true
 preferredRng: {}
  features:
 preferredSmm: {}
  firmware:
 preferredUseEfi: true
 preferredUseSecureBoot: true
  requirements:
    cpu:
    guest: 1
    memory:
   guest: 2Gi

This provides us with much more insight into the low-level details of the virtual machine, such as the use of a virtio disk bus and the preferred type of firmware (EFI).

Creating a virtual machine manifest

We’ll use the above information to create a virtual machine manifest using the virtctl utility. We’ll also create an SSH key pair to access our virtual machine, convert it into a secret, and inject it into the VM manifest. 

To create an SSH key pair, run this command:

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/wartortle/.ssh/id_rsa):
Created directory '/home/wartortle/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/wartortle/.ssh/id_rsa.
Your public key has been saved in /home/wartortle/.ssh/id_rsa.pub.

Let’s create a secret in our current project using the public key from this pair:

$ oc create secret generic myfedora-ssh --from-file=.ssh/id_rsa.pub

Create the manifest file for the Fedora virtual machine using virtctl:

$ virtctl create vm --name my-fedora-1 --instancetype u1.medium --infer-preference --volume-datasource name:root,src:openshift-virtualization-os-images/fedora,size:30Gi > myfedora.yaml

This command creates a manifest for a virtual machine named my-fedora-1 using the predefined instance type u1.medium. It leverages the preconfigured Fedora image from the openshift-virtualization-os-images project and allocates 30 GB of storage for the VM’s root filesystem.

The –infer-preference option tells virtctl to automatically infer the VM configuration preferences based on the provided information in the VMCP resource.

Before creating the VM, we’ll need to add some things to our manifest file. First, we must add a service creation label to expose the VM via the NodePort service:

$ vi myfedora.yaml
[.....]
  instancetype:
 name: u1.medium
  preference:
 inferFromVolume: root
  runStrategy: Always
  template:
 metadata:
     labels:
         myvm: fedora
[......]

Second, we must add the credentials to access our virtual machine. For this purpose, we’ll embed the secret containing the public SSH key we created earlier and define a username/password combination:

$ vi myfedora.yaml
[.....]
   volumes:
   - dataVolume:
          name: root
       name: root
   - cloudInitConfigDrive:
          userData: |
         #cloud-config
           user: fedora
           password: Fedora123
           chpasswd:
               expire: false
       name: cloudinitdisk
   accessCredentials:
     - sshPublicKey:
         propagationMethod:
           configDrive: {}
         source:
           secret:
             secretName: myfedora-ssh

We’ve now added our public key as the secret myfedora-ssh and created a cloud-init config drive. The cloud-init utility configures virtual machine instances during system boot. Major cloud providers also use it to bootstrap Linux compute instances in cloud environments. A config drive is presented to the virtual machine as a read-only drive, and the virtual machine can read files from it. We’ve added a username/password combination as user data in the config drive.

Our final manifest looks as follows:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  creationTimestamp: null
  name: my-fedora-1
spec:
  dataVolumeTemplates:
  - metadata:
      creationTimestamp: null
      name: root
    spec:
      sourceRef:
        kind: DataSource
        name: fedora
        namespace: openshift-virtualization-os-images
      storage:
        resources:
          requests:
            storage: 30Gi
  instancetype:
    name: u1.medium
  preference:
    inferFromVolume: root
  runStrategy: Always
  template:
    metadata:
      labels:
        myvm: fedora
    spec:
      domain:
        devices: {}
        resources: {}
      volumes:
      - dataVolume:
          name: root
        name: root
      - cloudInitConfigDrive:
          userData: |
            #cloud-config
            user: fedora
            password: Fedora123
            chpasswd:
              expire: false
        name: cloudinitdisk
      accessCredentials:
      - sshPublicKey:
          propagationMethod:
            configDrive: {}
          source:
            secret:
              secretName: myfedora-ssh

Creating the virtual machine

Let’s apply the finishing touches and create the VM.

$ oc create -f myfedora.yaml
virtualmachine.kubevirt.io/my-fedora-1 created
$ oc get vm
NAME          AGE   STATUS   READY
my-fedora-1   25s   Running   True

There you have it—the virtual machine is up and running. 

Let’s use the embedded SSH key to access our VM using virtctl:

$ virtctl ssh fedora@my-fedora-1 -i .ssh/id_rsa
The authenticity of host 'vmi/my-fedora-1.project1 ()' can't be established.
ECDSA key fingerprint is SHA256:EzRdrmAuIbfQgjsbI2r93tn1BGjSh1qwi0DYCJUIcBk.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'vmi/my-fedora-1.project1' (ECDSA) to the list of known hosts.
[fedora@my-fedora-1 ~]$

You can connect to the virtual machine’s VNC console using the virtctl utility. As shown below, use the username/password combination that you defined in the manifest file:

$ virtctl  console my-fedora-1
Successfully connected to my-fedora-1 console. The escape sequence is ^]

my-fedora-1 login: fedora
Password:
Last login: Sun Jun 16 13:52:28 on ttyS0
[fedora@my-fedora-1 ~]$

Using virtctl to access the VM is fairly simple, but for high-traffic scenarios, consider exploring alternative approaches, such as services, that distribute the load more efficiently.

Let’s create a service for our VM:

$ virtctl expose vm my-fedora-1 --name fedorasvc --type NodePort --port 22
Service fedorasvc successfully exposed for vm my-fedora-1
$ oc get svc
NAME     TYPE    CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
fedorasvc  NodePort   172.30.232.201       22:30825/TCP   8s

Find the node on which the VM is running and SSH it into the node via the corresponding port:

$ oc get vmi
NAME          AGE   PHASE   IP         NODENAME           READY
my-fedora-1   11m   Running 10.130.0.31   worker2.ovn.com   True
$ ssh -i .ssh/id_rsa [email protected] -p 30825
Last login: Sun Jun 16 13:52:50 2024
[fedora@my-fedora-1 ~]$

As you can see, the native service object works just as well for VMs.

Best practices for running virtualization workloads in OpenShift

Here are some recommended practices to get the most out of running virtual machines in OpenShift.

  • Standardize VM configurations: Use VMCPs and templates to ensure consistent configurations for your VMs, simplifying management and reducing errors.
  • Use Red Hat golden images: Take advantage of preconfigured virtual machine images provided by Red Hat. These secure images have undergone thorough testing, making the virtual machine setup process faster and more efficient.
  • Monitor resource usage: Monitor virtual machine CPU, memory, and storage consumption to identify potential bottlenecks and optimize resource allocation if necessary.
  • Enforce security: OpenShift enforces several security policies for virtualization workloads, such as not allowing virtual machines to run with root privileges. You can further utilize security context constraints (SCCs) to define security policies for virtual machines.
  • Perform backups: Implement a comprehensive backup and disaster recovery strategy for your virtual machines to ensure data protection and quick recovery in case of an incident. Consider using solutions such as Trilio for this purpose.

Learn How To Best Backup & Restore Virtual Machines Running on OpenShift

Conclusion

OpenShift virtualization enables a more efficient approach to handling virtualized workloads. It simplifies application deployment by consolidating containers and virtual machines in one platform. This unified control plane makes it easier to perform administrative tasks and facilitates the rollout of application stacks requiring both containers and VMs. In addition, OpenShift Virtualization integrates seamlessly with the OpenShift resources and tools you are already accustomed to, helping streamline the management of virtual workloads. 

If you want to enhance operational simplicity or reduce infrastructure spending by integrating your container and VM workloads, OpenShift Virtualization is an attractive option.