Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

OpenStack and Kubernetes are powerful and feature-rich open-source technologies that play an important role in cloud computing. 

OpenStack is a highly customizable cloud platform designed to be deployed on a wide range of hardware. It lets you build your own private cloud by providing a full suite of services for compute, storage, and networking.

Kubernetes (K8s) is a container orchestration platform that automates containerized applications’ deployment, scaling, and management. Originally developed by Google, K8s has become a standard for container orchestration. The platform can be deployed on various platforms, including bare metal and cloud infrastructure, and enables developers to manage complex applications efficiently across clusters of machines.

In this article, we go through deploying a K8s cluster on top of OpenStack. OpenStack provides the capabilities of underlying infrastructure resource provisioning with scalability. You can integrate various services of OpenStack, like Cinder (block storage) and Neutron (networking), with your Kubernetes cluster, allowing you to manage storage and networking for your containerized applications. We also share best practices and recommendations on how to perform disaster recovery.

Summary of best practices related to using OpenStack with Kubernetes

Best practice

Description

Plan capacity and prepare the environment

Estimate the resource requirement for the K8s cluster. Define the workload, performance, and availability requirements.

Choose the deployment method

Adopt a minimum viable deployment via Kubeadm or infrastructure as code (IaC) tools like Kubespray for the automation of production-grade K8s clusters.

Scale horizontally and/or vertically

Scale out the infrastructure to handle increased load on stateless applications. Scale up existing nodes to improve the performance of stateful applications.

Select a deployment method for high availability

Deploy multiple control, etcd, and worker nodes across different availability zones.

Pros and cons of various methods of deploying Kubernetes

Kubernetes can be deployed in many different ways depending upon the intended use of K8s, the skill level involved, target platforms, and the scale of the deployment. Each of these methods has its pros and cons and ideal intended use.

If a K8s environment is required for development only, it can be deployed on your local machine using minikube or MicroK8s. With minikube, you can have a local learning and development K8s environment on a Linux, macOS, or Windows operating system. MicroK8s is developed by Canonical and is available for Ubuntu and its compatible distributions. With MicroK8s, you can deploy K8s on a local machine or an edge node with a small footprint. MicroK8s provides single-command installation and automatic updates.

kubeadm

If you are willing to get your hands dirty and dive into command-line deployments, you can create a K8s cluster using kubeadm. With kubeadm, you can create a minimum viable cluster on your local machine or a cloud platform. As a prerequisite, you need the infrastructure ready: either physical or virtual nodes with a container runtime like Docker installed. You can then install kubeadm, initiate the K8s cluster setup on the master node, join the worker nodes to the cluster, and set up networking. If you have three or more machines available for the control plane and three or more available as worker nodes, you can create a high-availability cluster with kubeadm.

There are several command line steps involved while setting up a K8s cluster with kubeadm. It is recommended to integrate kubeadm with automated deployment tools like Terraform and Ansible.

kOps

kOps is an automated provisioning tool for K8s clusters that enables administrators to deploy, upgrade, and maintain production-grade clusters. kOps can provision the cloud infrastructure nodes in addition to the K8s cluster itself. Currently, kOps officially supports deployments on Amazon and Google Cloud platforms, with beta support for the DigitalOcean, Hetzner, and OpenStack cloud platforms. 

Charmed Kubernetes

Canonical provides a set of tools to build and manage K8s clusters. Charmed Kubernetes is built on Ubuntu and uses Juju to provide lifecycle management of K8s on top of OpenStack, VMware, AWS, GCP, and Azure cloud platforms. 

Juju is an automation and configuration utility developed by Canonical. It has a learning curve and can be somewhat complex to set up initially, but it reduces the manual configurations and complexity of production deployment environments. 

Kubespray

A production-grade K8s cluster can also be deployed using Kubespray, which supports deployment on multiple platforms, including AWS, GCE, Azure, OpenStack, and bare metal. Kubespray uses Ansible as its provisioning tool and provides playbooks for K8s cluster configuration and management. It supports multiple Linux platforms and is highly configurable. 

The challenge here is that you need good knowledge of Ansible and domain-specific knowledge of the intended deployment platform. In this article, we install a K8s cluster using Kubespray on the OpenStack cloud platform.

Automated Kubernetes Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

Setting up the OpenStack environment

Before performing the actual K8s cluster deployment, we need to prepare the infrastructure in the OpenStack cloud. This includes provisioning the compute instances for control and worker nodes. We also need to set up networking and security groups.

The trilioData Git repo contains all shell commands of this tutorial to deploy K8s cluster on OpenStack using Kubespray.

Setting up networking and security groups

For brevity, we assume that we already have an OpenStack environment available and have appropriate administrative credentials for the OpenStack project stored in the k8s-project-openrc.sh text file. The project has a router connected with an external provider network from which we have assigned public floating IPs for the nodes in the subnet 192.0.2.0/24. The router is connected with a private subnet 10.20.20.0/24 on which we will connect the control and worker nodes. The machines and IP allocations are as shown in the table below.

 

Machine roleMachine namePublic/floating IPPrivate IP
Controller 1node1192.0.2.1110.20.20.11
Controller 2node2192.0.2.1210.20.20.12
Controller 3node3192.0.2.1310.20.20.13
Worker 1node4192.0.2.1410.20.20.14
Worker 2node5192.0.2.1510.20.20.15
Worker 3node6192.0.2.1610.20.20.16

 

The K8s cluster nodes will need to communicate with each other. Also, we need to allow management ports to be able to access the cluster from outside the cloud environment. 

Source the OpenStack credentials and authentication URL from the file k8s-project-openrc.sh. Then create a security group and allow the required IPs and ports, as shown below.

# Source the OpenStack credentials
$ source k8s-project-openrc.sh
$ openstack security group create k8s-cluster
$ openstack security group rule create --ingress \
--remote-ip 192.0.2.0/24 --protocol tcp k8s-cluster
$ openstack security group rule create --ingress \
--remote-ip 192.0.2.0/24 --protocol udp k8s-cluster
$ openstack security group rule create --ingress \
--remote-ip 192.0.2.0/24 --protocol icmp k8s-cluster
$ openstack security group rule create --ingress \
--remote-ip 0.0.0.0/0 --protocol tcp --dst-port 22 k8s-cluster
$ openstack security group rule create --ingress \
--remote-ip 0.0.0.0/0 --protocol tcp --dst-port 443 k8s-cluster
$ openstack security group rule create --ingress \
--remote-ip 0.0.0.0/0 --protocol tcp --dst-port 6443 k8s-cluster

Provisioning compute instances for Kubernetes node

We will deploy three nodes as controllers and three as worker nodes. The following loop will create the six nodes with the required specs, connected with the K8s private subnet and with the security group already created.

$ for i in $(seq 1 6); do
  INSTANCE_NAME="node-${i}"
  openstack server create --flavor sm3.small --network k8snet \
                          --image ubuntu-22.04-amd64.img  \
                          --key-name SSH-KEY \
                          --security-group k8s-cluster \
                          $INSTANCE_NAME
done

Watch this 1-min video to see how easily you can recover K8s, VMs, and containers

Setting up Kubespray

The K8s cluster will be deployed using Ansible. In the typical operations of Ansible, you have a control node on which you install the required tools and libraries and run your playbooks from the node. This can be your laptop or workstation, but it is recommended to have a dedicated machine from which you can grant access to your team and manage your infrastructure nodes.

Getting Kubespray

On your Ansible control node, retrieve the Kubespray repository. You can clone the repository from GitHub:

$ mkdir ~/Projects/workspace
$ cd ~/Projects/workspace
$ git clone https://github.com/kubernetes-sigs/kubespray.git

Installing prerequisites, Python, and Ansible

Ansible can be installed and run on a majority of Linux distributions; Kubespray requires Python version 3.9+. You need to install the versions of both Python and Ansible appropriate for your specific distribution. On Ubuntu, this can be done as follows:

# Install python
$ sudo apt install python3

# Install ansible
$ sudo apt install software-properties-common
$ sudo add-apt-repository --yes --update ppa:ansible/ansible
$ sudo apt install ansible

Next, you need to install the Python modules required for the proper functioning of Kubespray. Under the downloaded kubespray repository, the necessary modules are in the requirements.txt file:

$ cd ~/Projects/workspace/kubespray
$ sudo pip3 install -r requirements.txt

Setting up the Ansible inventory

Ansible requires an inventory of the nodes on which the playbooks are run. We will copy the sample inventory and the required configuration variables from the sample directory. We will then populate the inventory with the node’s IP addresses.

# Copy `the sample inventory`
$ cd ~/Projects/workspace/kubespray
$ cp -rfp inventory/sample inventory/mycluster

# Update Ansible inventory file with inventory builder
$ declare -a IPS=(192.0.2.11 192.0.2.12 192.0.2.13 192.0.2.14 192.0.2.15 192.0.2.16)
$ CONFIG_FILE=inventory/mycluster/hosts.yaml \
python3 contrib/inventory_builder/inventory.py ${IPS[@]}

This will create the file inventory/mycluster/hosts.yaml, which we will edit to update the required variables:

  • Remove access_ip from all nodes. 
  • Update the ip variable to the private IP of the nodes on which K8s services will listen. 
  • Define node1, node2, and node3 as the control_plane and etcd nodes. 

The following is the updated hosts.yaml inventory file:

all:
  hosts:
    node1:
      ansible_host: 192.0.2.11
      ip: 10.20.20.11
    node2:
      ansible_host: 192.0.2.12
      ip: 10.20.20.12
    node3:
      ansible_host: 192.0.2.13
      ip: 10.20.20.13
    node4:
      ansible_host: 192.0.2.14
      ip: 10.20.20.14
    node5:
      ansible_host: 192.0.2.15
      ip: 10.20.20.15
    node6:
      ansible_host: 192.0.2.16
      ip: 10.20.20.16
  children:
    kube_control_plane:
      hosts:
        node1:
        node2:
        node3:
    kube_node:
      hosts:
        node4:
        node5:
        node6:
    etcd:
      hosts:
        node1:
        node2:
        node3:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}

Choosing the plugins, network, storage, and ingress

We need to define the API server load balancer type and the cloud environment on which the K8s cluster will be deployed. Edit the group variables file inventory/mycluster/group_vars/all/all.yml and update the following variables.

## Internal loadbalancers for apiservers
# loadbalancer_apiserver_type: # valid values "nginx" or "haproxy"
loadbalancer_apiserver_type: nginx

## There are some changes specific to the cloud providers
## for instance we need to encapsulate packets with some network plugins
## possible values 'gce','aws','azure','vsphere','oci','external'
cloud_provider: external

## Supported cloud: 'openstack','vsphere','huaweicloud' and 'hcloud'
external_cloud_provider: openstack

You can enable different addons for Kubernetes deployment in the addons variables file inventory/mycluster/group_vars/k8s_cluster/addons.yml. We will enable the metrics server addon so that we can monitor the resource usage of the cluster nodes:

# Metrics Server deployment
metrics_server_enabled: true

Configuring the K8s cluster and OpenStack-specific variables

Edit the inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml and configure the network plugin, pods address space, and API server port:

# Choose network plugin (cilium, calico, kube-ovn, weave or flannel)
kube_network_plugin: flannel

# Kubernetes internal network for services, unused block of space.
kube_service_addresses: 10.233.0.0/18

# internal network. When used, it will assign IP
# addresses from this range to individual pods.
# This network must be unused in your network infrastructure!
kube_pods_subnet: 10.233.64.0/18

# The port the API Server will be listening on.
kube_apiserver_port: 6443  # (https)

# Kube-proxy proxyMode configuration.
# Can be ipvs, iptables
kube_proxy_mode: iptables

If we want to access the Kubernetes API outside the OpenStack private network, we need to update the supplementary_addresses_in_ssl_keys variable with a list of the IP addresses of the controller nodes. Edit the k8s-cluster.yml file and make the following changes:

## Supplementary addresses that can be added in kubernetes ssl keys.
supplementary_addresses_in_ssl_keys: [192.0.2.11, 192.0.2.12, 192.0.2.13]

Enable the Cinder CSI plugin to manage the lifecycle of OpenStack Cinder volumes. Edit the file inventory/mycluster/group_vars/all/openstack.yml:

## To use Cinder CSI plugin to provision volumes set this value to true
## Make sure to source in the openstack credentials
cinder_csi_enabled: true

Verifying SSH connectivity with the nodes

One last step before starting the K8s cluster deployment is to verify that you can connect with all the nodes. You can run the Ansible ping module to confirm that you have SSH access to the nodes and that the Ansible inventory file is correct:

$ ansible -m ping -i inventory/mycluster/hosts.yaml -u ubuntu all

node2 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
node3 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
node5 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
node1 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
node4 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
node6 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

Learn about the features that power Trilio’s intelligent backup and restore

Deploying K8s cluster with Kubespray

Deploying the K8S cluster

We are now ready to start the deployment of the K8s cluster. Source the OpenStack variables and run the cluster deployment Ansible playbook:

$ source k8s-project-openrc.sh
$ ansible-playbook -i inventory/mycluster/hosts.yaml \
-u ubuntu --become cluster.yml

Now sit back and take a coffee break—the K8s cluster deployment may take up to 30 minutes because the playbook will need to download lots of packages to complete the deployment. 

After successfully completing the Kubespray playbook, you should get the following output without any failed tasks:

The deployment will create a kubectl config on the controller node with cluster authentication information. You can do an SSH connection into the controller node1 and copy the file into your home directory. You can also download the config file remotely into your Ansible controller and then use kubectl to interact with the K8s cluster.

ubuntu@node1:~$ sudo cp /etc/kubernetes/admin.conf .kube/config 

ubuntu@node1:~$ kubectl get nodes
NAME    STATUS   ROLES           AGE   VERSION
node1   Ready    control-plane   17h   v1.29.5
node2   Ready    control-plane   17h   v1.29.5
node3   Ready    control-plane   17h   v1.29.5
node4   Ready              17h   v1.29.5
node5   Ready              17h   v1.29.5
node6   Ready              17h   v1.29.5

ubuntu@node1:~$ kubectl get pods -n kube-system
NAMESPACE     NAME                                          READY   STATUS    RESTARTS      AGE
kube-system   coredns-69db55dd76-q7qtl                      1/1     Running   0             2m9s
kube-system   coredns-69db55dd76-vtxfv                      1/1     Running   0             15m
kube-system   csi-cinder-nodeplugin-5dv49                   3/3     Running   0             15m
kube-system   csi-cinder-nodeplugin-6bzr2                   3/3     Running   0             15m
kube-system   csi-cinder-nodeplugin-8b9s8                   3/3     Running   0             15m
kube-system   dns-autoscaler-6f4b597d8c-v6h5l               1/1     Running   0             15m
kube-system   kube-apiserver-node1                          1/1     Running   1             19m
kube-system   kube-apiserver-node2                          1/1     Running   1             18m
kube-system   kube-apiserver-node3                          1/1     Running   1             17m
.
.

Doing a test deployment

We can now start deploying applications on top of the K8s cluster. As an example, we will do a deployment of an Nginx web server and check the pod created by the Nginx deployment:

ubuntu@node1:~$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

ubuntu@node1:~$ kubectl get pods -l app=nginx
NAME                     READY   STATUS    RESTARTS   AGE
nginx-7854ff8877-g6b24   1/1     Running   0          13s

Tearing down the cluster (if required)

If, for any reason, we want to tear down the K8s cluster—maybe to start again—we can use the reset.yml playbook of Kubespray to do that. This will remove all the installed packages and remove the K8s cluster configurations:

$ ansible-playbook -i inventory/mycluster/hosts.yaml \
-u ubuntu --become reset.yml

Best practices for running K8s environments

Plan capacity and prepare the environment

Whenever you are doing a production-grade deployment, there are certain prerequisite steps that should be taken before going live. The first one is the planning and the preparation phase. 

Start by figuring out the type of application workload you will be deploying on your cluster. What is the profile of your applications? Are these CPU intensive or memory intensive? What are the I/O requirements? You need to plan your compute resources accordingly and choose the appropriate instance type with the right CPU, RAM, and disk IO specifications. 

Most languages have their own profiling tools or libraries. For Java, you can use tools like YourKit, JProfiler, or VisualVM that can be integrated into applications to collect profiling data. Go applications have pprof, which offers CPU and memory profiling. For Python, cProfile, line_profiler, and Pyinstrument are popular options.

Another thing you need to account for is the resources required for the K8s components themselves (e.g., kube-apiserver, kube-scheduler, kube-controller-manager, kubelet, etc.).

If you are offering persistent volumes, you should have sufficient storage capacity and  the ability to increase this capacity. You also need to define storage classes based on the I/O performance requirement of the applications (HDDs or SSDs). 

Choose the right deployment methods

The right deployment method depends on which ecosystem you are already working with and your team’s skill set. If you have a deeper understanding of Kubernetes internals and networking concepts, you can use kubeadm as the officially developed and maintained tool by the Kubernetes team. You get fine grained control over the cluster configuration and allows greater customization according to your specific needs.

If your team is proficient in automation tools like Ansible and if you want a repeatable, consistent deployment with speed and efficiency then the recommendation is to go with an automated deployment tool as explained above. With this, although you get slightly less flexibility compared to kubeadm, you can reduce manual efforts and chances of errors using IAC tools.

Scale horizontally and/or vertically

As infrastructure grows, you eventually come to a decision point when it comes to scaling up your resources. You have the option of either scaling horizontally (scaling out), which means adding more nodes to your deployment cluster, or scaling vertically (scaling up), which is adding more compute resources to your existing nodes.

When you have stateless applications, you can scale horizontally to handle increased load by distributing the load across multiple instances while also improving fault tolerance and availability. With more nodes, you get better redundancy. A small downside is that on each node there is some compute capacity reserved for running the OS and K8s components, which slightly decreases your overall cluster capacity.

With stateful applications where individual instances need more resources, you can scale vertically for better resource optimization. It is simpler and easier to manage a smaller number of resources, but that comes with the downside of reduced fault tolerance.

Select a deployment method for high availability

Separate nodes: control, worker, etcd

In a high-availability cluster, you cannot have any single point of failure. This means that you should have separate nodes for control and worker nodes. 

The control plane components should be replicated across multiple nodes. The minimum recommendation is three nodes to maintain a quorum, i.e., the minimum number of nodes needed for the cluster to agree on updates and maintain data consistency. 

You must set up a load balancer in front of the cluster API servers. You should have at least N+1 spare worker nodes to handle the failure of a single node. For even greater levels of availability, you can decouple etcd from the control plane nodes. This topology will reduce the impact of losing any control node where etcd is run as an external cluster to the control plane. 

High-availability Kubernetes cluster topology with external etcd cluster (source)

High-availability Kubernetes cluster topology with external etcd cluster (source)

Deploying across different availability zones

Most large cloud operators offer multiple regions to keep failure zones separate. OpenStack also has the concept of availability zones (AZs) that are logically defined zones within a single physical data center; each zone has independent power, cooling, and network infrastructure. Deploying OpenStack services across multiple AZs enhances fault tolerance within a data center.

For the cluster control plane, where availability is a major concern, you should replicate the control components (API server, scheduler, etcd, and cluster controller manager) across at least three different AZs.

Backup and disaster recovery options

High availability is obviously important for production clusters, and you should also have a backup and disaster recovery plan in case the cluster itself faces a catastrophic failure. This section will discuss some of the disaster recovery options and their effectiveness. 

etcd backup

In a K8s cluster, etcd is the primary key-value data store that replicates all the Kubernetes cluster states. Backing up etcd is crucial to perform disaster recovery in case of any critical failure.

You can use the etcdctl utility on one of the control nodes to take snapshots of the etcd key-value store. You need to specify the endpoint and the certificate files for connecting with the etcd cluster:

ubuntu@node1:~$ sudo etcdctl --endpoints https://node1:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/member-node1.pem \
--key=/etc/ssl/etcd/ssl/member-node1-key.pem \
snapshot save etcd-backup.db
.
.
Snapshot saved at etcd-backup.db

Make sure that the etcd-backup.db file is stored securely offsite. Also, back up on a regular frequency so that you have the least amount of data loss in case of disaster. Please note with the etcd backup, you are only able to recover the K8s cluster configuration and state—not the pods’ persistent data.

OpenStack Cinder snapshots

The second option is to use the snapshot option of OpenStack Cinder volumes. You can preserve the state of the controller and worker node volumes using this option, which is different from etcd database backup. The volume snapshots can be used to clone new volumes or perform recovery to the most recent snapshots in case of any node corruption or failure. 

The snapshots are colocated with the volume backends. In the case of failure of the storage backend itself, you would lose both the node volume and its snapshot.

# Source the OpenStack credentials
$ source k8s-project-openrc.sh
$ openstack volume snapshot create controler1-backup controller1-volume
.
.

Trilio K8S backup and recovery

The two backup and recovery options above offer only partial restoration. The etcd backup only captures the K8s state, and with Cinder snapshots, you can restore the storage volumes by creating new VMs and attaching the snapshots.

A third option is to use Trilio for OpenStack backup and recovery. Trilio can fully orchestrate disaster recovery even in the case of complete OpenStack cloud failure.

Trilio is natively installed alongside OpenStack services and captures the complete state of the OpenStack project—including disk images, instance metadata, Cinder volumes and metadata, ssh-key pairs, network security groups, network subnets, ports, and MAC addresses—and stores all this data as a qcow2 image offsite in NFS or S3 storage. Trilio can perform automated full and incremental backups, and you can perform the disaster recovery on the same or any other offsite OpenStack cluster. 

OpenStack deployment with the Trilio data protection service (source)

OpenStack deployment with the Trilio data protection service (source)

Learn about a lead telecom firm solved K8s backup and recovery with Trilio

Last thoughts

In the article we discussed the flexibility and scalability of OpenStack cloud with the power of Kubernetes container orchestration platform. With OpenStack you get a cost effective and solid cloud platform with advanced networking and storage integration. While K8s provides you with cloud-native, scalable and highly-available applications. 

You can use tools like kubeadm for greater control and flexibility or utilize infrastructure automation tools like Kubespray to automate and standardize your deployments. 

We also shared some of the best practices of running a production-grade cluster. It is also important to do proper planning regarding the resource requirement of the applications while making sure to plan ahead on how to scale up or scale out of your infrastructure. With a live platform, it is always crucial to have a disaster recovery strategy that is able to bring back the complete cluster with minimal loss and human interaction.