Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

Administrators use orchestration platforms to manage and automate applications seamlessly as they slowly transition from workloads running on virtualized servers to modern containerization solutions. Kubernetes, or K8s for short, is an open-source container orchestrator that manages containerized application deployments. K8s can be self-managed or cloud-managed, like Microsoft’s Azure Kubernetes Service (AKS).

As the name suggests, the Microsoft Azure cloud platform hosts AKS, which integrates well with other cloud-containerized applications and instances and other development and application suites within the Azure cloud tenant. Aside from the multiple cloud-as-a-service selections, Microsoft Azure also offers multiple protection suites, such as a backup and disaster recovery (DR) solution, which includes backing up AKS clusters. 

In this article, we discuss the steps to configure different backup options within Azure Backup, the limitations of using the native backup solution, and best practices for backing up your AKS cluster.

Note that this article will not discuss basic concepts for Microsoft Azure, concepts related to using Kubernetes, or the setup of an AKS cluster in Microsoft Azure. It is advisable to have prior knowledge of cloud administration and container orchestration and at least a working and running AKS cluster for testing.

Summary of key best practices for AKS backup

Best practice

Description

Regularly test restores

Perform test restores to a non-production environment (e.g., a separate cluster in a sandbox) to verify backup integrity and the recovery process.

Consider cross-region recovery alternatives

Mitigate cross-region restore limitations in Azure Backup with geo-redundant storage, Infrastructure-as-Code, or multi-region AKS.

Review identity and access management

Audit assigned roles and permissions to validate if they align with the principle of least privilege and are limited to the necessary Azure resources.

Implement least privilege

Grant only the minimum necessary permissions to the managed identity.

Choose the right backup level

Select the appropriate scope of data that will be protected based on your recovery needs and the characteristics of your applications.

Evaluate alternative backup solutions

Explore Kubernetes-native backup solutions for AKS if Azure Backup’s limitations do not meet your requirements.

Azure backup in different AKS scenarios

Multiple backup solutions are currently available in the Azure Marketplace. However, since the goal is to tackle a cloud-based Kubernetes cluster sitting on Microsoft Azure, Azure Backup is the first choice for protecting AKS, offering native integration, simplified workflows, and attractive pricing. 

Azure Backup allows you to back up multiple levels of your AKS cluster. 

Application-level backup

This type of backup focuses on protecting data and configuration items at the application level, specifically, individual applications or workloads running on the AKS cluster.

This backup type is useful for stateful applications primarily relying on persistent storage, like MySQL or MongoDB instances running on Azure Files. If the application is stateless, such as Azure SQL or Azure Cosmos DB, then an application-level backup will not be necessary.

However, Azure Backup for AKS also has limitations regarding application-level restores. Advanced restoration modifications like renaming resources, remapping namespaces, or altering storage classes are not supported. The restoration procedure will automatically bypass any conflicting resources in the target cluster unless patching or manual intervention is used. The limitations section of this article delves into this topic more.

Automated Kubernetes Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

Namespace-level backup

A Kubernetes namespace is a mechanism for isolating groups of resources within a single Kubernetes cluster. It provides a scope for names, preventing conflicts between resources in different namespaces. Namespaces are commonly used to organize resources by team, project, or environment (e.g., development, staging, production). A namespace-level backup captures all resources within that defined scope, including deployments, services, configuration maps, secrets, and other objects.

A typical scenario would be backing up multiple microservices and configurations running in a development namespace. This level of backup ensures that you can restore all components together within the namespace in case of an emergency or disaster.

Azure Backup can back up multiple namespaces using one backup policy. However, this may also have disadvantages. For example, if a particular shared service exists across numerous namespaces, there may be inconsistency during restoration. In this case, it is better to back up a whole cluster.

Cluster-level backup

As the name suggests, a cluster-level backup secures the entire AKS cluster, including all namespaces, configurations, and resources. It can help recover a cluster-wide outage due to misconfiguration or significant service failure. Depending on the backup solution, an application-level restore can be performed using a cluster-level backup. 

As a best practice, you should avoid performing a cluster-level restore if only a single workload or application is impacted.

Setting up Azure Backup for AKS

Resource provision

To start with, provision the resources in your Azure ecosystem before configuring Azure Backup. In your resource group, create the following:

  • Storage account.
  • Blob container within the storage account. You can store your AKS cluster backups in this blob container. Ensure that the storage account is in the same region and subscription as the cluster.
  • Backup vault (recovery services). The backup vault must have trusted access enabled for the AKS cluster you want to back up. To allow trusted access, select Grant permission. Once enabled, choose Next.
  • Backup policy set inside the vault. Add a retention rule for the vault tier if you want to store long-term backups for compliance reasons, enable ransomware protection features, or use backups for regional disaster recovery.

Backup for AKS clusters is not enabled by default, so once you have created the resources successfully, follow the additional prerequisites detailed in this official Microsoft documentation.

Additional prerequisites

Setting up Azure Backup as the primary solution for AKS can be complex and tedious. In addition to the previous steps, you need to install the backup extension in your Azure tenant to fully enable Azure Backup for AKS.

To start, follow these steps:

1. In the Azure portal, go to the AKS cluster you want to back up, and then under Settings, click Backup.

2. To prepare the AKS cluster for backup, you must install the backup extension by selecting Install Extension.

3. Provide the previously created storage account and blob container as input. Select Next.

Installing the AKS backup extension (source)

Installing the AKS backup extension (source)

4. Review the extension installation details provided, and then select Create. The deployment will begin with the extension’s installation.

Creating the necessary resources for AKS backup extension (source)

Creating the necessary resources for AKS backup extension (source)

5. Once the backup extension is installed successfully, start configuring backups for your AKS cluster by selecting Configure Backup.

6. Select a backup vault to use for the AKS instance backup.

Selecting the backup vault to store the backup (source)

Selecting the backup vault to store the backup (source)

The Backup vault must have Trusted Access enabled for the AKS cluster you want to back up. To enable Trusted Access, select Grant permission. Once enabled, select Next.

Granting trusted access permission to the backup vault (source)

Granting trusted access permission to the backup vault (source)

7. Select a backup policy defining the backup schedule and its retention period, then click Next.

Selecting the backup policy for scheduling (source)

Selecting the backup policy for scheduling (source)

8. On the Datasources tab, click Add/Edit to define the backup instance configuration.

Defining the backup instance (source)

Defining the backup instance (source)

Scoping the resources

You can target and back up your assets at multiple levels. As mentioned earlier, you may specify a specific level (application, namespace, or cluster-level) of your backup. 

In the Select Resources to Backup pane, define the level of cluster resources you want to back up:

  • To perform a cluster-level backup, select All namespaces to back up (including future namespaces) and check Include cluster scope.
  • To perform a namespace-level backup, choose the namespace from the list.
  • To perform an application-level backup, narrow the list of workloads and resources by filtering them using labels.

Scoping the resources that need to be backed up (source)

Scoping the resources that need to be backed up (source)

Azure Backup can restore to another region during a region-wide outage. To back up your AKS clusters to another region, select a backup vault with storage redundancy set to globally redundant and cross-region restore enabled. 

AKS Backup, by default, is stored in Azure Backup’s Operational tier. The Snapshot/Operational tier is the most expensive backup tier of Azure Backup, where the backup is not moved to a vault but is stored close to the source or on your own tenant. Additional configuration may be required to reduce further costs, such as enabling and moving to vault-tier backups. 

This article focuses on single-region AKS cluster backups, but you can find more information in this official Microsoft documentation. However, please note that cross-region support is not available in all regions.

Finalize the backup

Now that you have narrowed down or scoped the resources, you need to back up the setup.

1. For the snapshot resource group, select the resource group to use and select Validate.

Supplying the snapshot resource group value and validating (source)

Supplying the snapshot resource group value and validating (source)

2. When validation is finished, an error appears if the required roles aren’t assigned to the vault in the snapshot resource group.

 Validating the role assignment of the snapshot resource group (source)

 Validating the role assignment of the snapshot resource group (source)

3. Select the data source under the datasource name to resolve the error and then select Assign missing roles.

Assigning missing roles in the data source name of the cluster (source)

Assigning missing roles in the data source name of the cluster (source)

4. When the role assignment is finished, select Next.

Resolved errors after role assignments were performed (source)

Resolved errors after role assignments were performed (source)

5. Select Configure backup.

6. When the configuration is finished, select Next.

Summary of details before finalizing the setup (Source)

Summary of details before finalizing the setup (Source)

The backup instance is created when you finish configuring the backup.

Finalized result after backup setup (Source)

Finalized result after backup setup (Source)

Once followed correctly, Azure Backup should start backing up your Kubernetes cluster according to the previously configured backup policy schedule. Note that Azure Backup allows you to create multiple backup policies, each tailored to different parts of your AKS cluster. You can define separate policies for individual applications, namespaces, or the entire cluster, and configure each with its own schedule and retention settings. 

Limitations of AKS backup

While Azure Backup is comprehensive and reliable enough to ensure the integrity of  AKS cluster backups, there are several downsides that should be considered before and while configuring it.

Configuration complexity

The initial setup of Azure Backup for AKS requires significant effort to set up policies and resource dependencies. After setting up all these resources, Azure Backup only gives you limited options for how granularly you can perform backup on your AKS cluster. Resource transformations will often be required during restore operations.  Azure Backup only support this through manual patches.  Streamlined and automated transformations are available from other Backup and Restore solutions like Trilio.

In addition to setting up the backup for a single region, another layer of complexity is added when you want to protect clusters in multiple areas. We will discuss more of this later as we set up the initial backup configuration.

Identity management

A recent deprecation update states that Microsoft Azure has moved from using service principals to managed identities. Even though the update provides improved security, using managed identities over service principals has certain disadvantages. Some examples are a lack of cross-tenant support, limited legacy compatibility, added migration complexity for existing clusters, and dependency on Azure’s internal automation, resulting in a lack of choices and alternatives.

Vendor exclusivity

While Azure Backup is Microsoft Azure’s native backup solution, it presents limitations, particularly for organizations with multi-cloud or hybrid environments. A key restriction is its focus on the Azure ecosystem, making it difficult to back up resources outside of Azure and Microsoft. This lack of flexibility can hinder operational strategies. For instance, financial institutions needing to comply with the Digital Operational Resilience Act (DORA) face challenges due to this vendor lock-in. Specifically, the inability to restore AKS backups stored in Azure Blob storage or disk snapshots to other cloud providers complicates achieving full DORA compliance, as it restricts data portability.

Best practices for AKS backup and recovery

To build a truly robust and dependable AKS recovery strategy, consider adopting the following recommendations.

Validate and test restores

Performing scheduled test restoration is one of the best practices for backing up an AKS cluster (and anything, in general). Proper testing ensures that whenever a disaster strikes, you are sure that the data’s integrity is intact and can be successfully recovered. Usually, a non-production environment is used to simulate all these test restores, and in this case, you can perform this on a separate cluster in a sandbox environment.

Consider Cross-Region Recovery Alternatives

While cross-region recovery is supported for some workloads, it is more limited than other Azure services. For clarification, if a regional outage occurs, Azure Backup will allow restoration of workloads in the paired region (e.g., East US and West US), but an AKS backup does not support direct restoration to a different Azure region (e.g., East US to Japan East).

In this case, consider other methods or alternatives for recovery to other regions, like:

  • Using geo-redundant storage (GRS) for persistent volumes
  • Implementing infrastructure-as-code solutions like Terraform or ARM templates to automatically recreate clusters in another region in case of a disaster 
  • Consider multi-region AKS deployments for high availability

While Azure Backup supports modified restores, Trilio enables efficient and automated transformations for geo-redundant, cross-region, cross-distro, and cross-cloud disaster recovery, thereby helping you avoid vendor lock-ins. 

Review identity and access management settings

As mentioned, after deprecating service principals, AKS backup is now using managed identities to ensure that correct permissions are assigned with a high level of security. However, improper permissions assignments may lead to failures or restricted recovery options, so you need to review them after implementing the AKS backup.

Make sure that the following roles are assigned on the subscription, resource group, or (if you want to be more granular) cluster level:

  • Backup Contributor
  • Restore Operator
  • Reader
  • Storage Blob Data Contributor

Also, specifically for AKS clusters using Azure Disks or Azure Files as a form of persistent storage, the managed identity must have permissions for the following:

  • Disk Backup Reader for Azure Disks access
  • Storage Account Contributor for Azure Files access

As a general principle in modern IT, always follow the least-privilege principle and apply limiting permissions only to the necessary level of Azure resources to minimize attack vectors or infrastructure-wide misconfigurations.

To put least privilege into practice, consider the following steps:

  1. Start with the built-in roles: Before creating custom roles, thoroughly understand the built-in Azure roles related to backup and recovery (Backup Contributor, Restore Operator, Backup Operator, etc.). See if they meet your needs with appropriate scoping.
  2. Scope permissions tightly: Always grant permissions at the lowest possible scope:
    • Subscription: Rarely needed for least privilege
    • Resource group: Often the best balance
    • Individual resource: For example, a specific AKS cluster, backup vault, or storage account (most granular option)
  3. Use the Azure Portal’s access control (IAM) blade: The IAM blade for each resource allows you to easily assign roles and check existing permissions.
  4. Use Azure Policy: Azure Policy can be used to enforce least privilege. You can create policies that:
    • Prevent the assignment of overly permissive roles (e.g., blocking “Contributor” at the subscription level)
    • Require the use of managed identities for certain operations
    • Audit existing role assignments for compliance

Trilio for Backup and Restore

Azure Backup offers a pretty basic, infrastructure-centric approach to protecting AKS. It lacks the granularity and advanced features needed for true enterprise-grade Kubernetes backup and recovery, especially in complex or large-scale deployments.

Trilio Backup and Restore is specifically designed for the dynamic nature of Kubernetes. It offers features like application-consistent backups, support for operator-based deployments, and a centralized management interface that spans multiple clusters and clouds.

Using Trilio Vault to backup an AKS cluster (source)

Using Trilio Vault to backup an AKS cluster (source)

Going beyond Azure Backup’s capabilities, the software’s advanced multi-cluster, hybrid cloud, and multi-tenant UI simplifies backup and restore setup and management with granular backup options. Unlike Azure Backup, Trilio also includes other scoping options for multi-level backups, such as the ability to back up complex, stateful applications that are managed by Kubernetes Operators. 

Learn more about Trilio’s offerings and pricing directly on the Azure Marketplace.

Last thoughts

Azure Backup is Microsoft’s native backup solution that protects AKS clusters within the Microsoft Azure tenant. However, due to its complexity, exclusivity, and limitations, many enterprises choose to adopt a better solution.

Trilio is built for multi-cluster, multi-cloud environments and stands out as a robust backup solution for AKS clusters by offering flexibility and advanced features your enterprise needs. With Trilio, you can meet your RTO and RPO targets and ensure rapid recovery from any disruption. 

To learn more, try out the platform or read more about how Trilio can help you with application-consistent backups.

Table Of Contents

Like This Article?

Subscribe to our LinkedIn Newsletter to receive more educational content

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.