Deploying an OpenShift cluster involves more than just a standard Kubernetes installation. Being an opinionated version, OpenShift predefines elements like the operating system and an immutable control plane, and it includes additional components for enhanced security and operational ease.
Unlike Kubernetes, which can be installed on almost any general-purpose Linux-based distribution (such as Ubuntu or RedHat) using package managers, OpenShift has stricter infrastructure requirements. Starting with version 4.19, OpenShift requires Red Hat CoreOS (RHCOS) for its cluster nodes, enforcing an immutable, container-optimized OS for security and consistency. This design not only reduces configuration drift and attack surfaces but also simplifies installation compared to standard Kubernetes, thanks to the OpenShift installer automating the challenging aspects of the setup.
This article explores OpenShift’s deployment process, how it differs from traditional Kubernetes setups, and the trade-offs between self-managed and managed options.
Summary of key OpenShift deployment considerations
The following table provides an overview of the key considerations when deploying OpenShift:
Consideration | Description |
OpenShift vs. Kubernetes | OpenShift provides a curated Kubernetes experience with preintegrated security, networking, and tooling, whereas vanilla Kubernetes serves as a flexible base requiring additional setup for production-grade features. |
Self-managed vs managed OpenShift | Self-managed OpenShift allows deep customization for specialized needs, while managed OpenShift simplifies operations with turnkey clusters, making it ideal for teams prioritizing developer productivity over infrastructure control. |
Setting up OpenShift on Azure | Azure RedHat OpenShift (ARO) simplifies OpenShift deployment on Azure by managing the control plane and infrastructure, enabling developers to focus on applications while maintaining compatibility with standard OpenShift tooling and APIs. |
Backing up OpenShift | Trilio provides production-grade OpenShift backup with custom resource definitions and operator support, enabling full-cluster or namespace-level recovery to any point in time. |
Automated Application-Centric Red Hat OpenShift Data Protection & Intelligent Recovery
Key differences between OpenShift and standard Kubernetes deployments
While Kubernetes is the foundation for container orchestration, OpenShift, Red Hat’s enterprise ready Kubernetes platform, builds upon it with opinionated design choices, enhanced security, and integrated tooling. The following are some of the key distinctions between OpenShift and a standard Kubernetes deployment.
Node requirements and immutable infrastructure
Unlike traditional Kubernetes, where nodes can run on any Linux distribution (e.g., Ubuntu or CentOS), OpenShift mandates Red Hat CoreOS (RHCOS) for control plane nodes. RHCOS is an immutable, container-optimized OS explicitly designed for OpenShift. Its immutability ensures that the control plane remains secure and consistent and critical files cannot be modified at runtime, reducing attack surfaces.
Worker nodes can use RHCOS or Red Hat Enterprise Linux (RHEL), but enforcing RHCOS for control planes highlights OpenShift’s focus on security by default. In contrast, vanilla Kubernetes can run on all major Linux distributions. While this offers great flexibility, it can lead to inconsistencies and potential vulnerabilities if the underlying OS is not appropriately hardened.
Built-in tooling and integrations
OpenShift ships with several integrated solutions that Kubernetes administrators would typically need to deploy separately:
- OperatorHub: OpenShift includes OperatorHub by default, simplifying the installation and lifecycle management of Kubernetes Operators (e.g., databases and monitoring stacks). In contrast, vanilla K8s requires the manual setup of Operator frameworks like the Operator Lifecycle Manager (OLM).
- Authentication: OpenShift provides built-in OAuth (via the OpenShift OAuth server), allowing seamless integration with identity providers (LDAP, GitHub, etc.). Standard Kubernetes relies on third-party solutions (e.g., Dex, Keycloak) or manual configuration.
- Ingress and load balancing: OpenShift uses HAProxy-based routers as its default, primary ingress controller, whereas Kubernetes requires setting up ingress controllers (e.g., NGINX, Traefik) separately.
Security and compliance enforcement
OpenShift enforces stricter security policies than standard Kubernetes: SELinux is enabled by default, providing containers with mandatory access control (MAC). Role-based access control (RBAC) is preconfigured with sensible defaults, whereas Kubernetes leaves RBAC rules to the administrator.
Network policies are more streamlined, with OpenShift’s software-defined networking (SDN) offering multi-tenancy support. In contrast, while Kubernetes supports these features, they often require additional configuration and third-party tools to match OpenShift’s security posture.
Infrastructure Nodes
Since OpenShift is a commercial product, organizations with increasing application workloads may quickly encounter higher subscription costs as their infrastructure expands. As the containerized infrastructure increases, so does the underlying platform overhead for components like monitoring, logging, and routing. To mitigate this and to provide clear separation of duties, OpenShift allows for dedicated Infrastructure Nodes to host these core platform services, which can reduce the number of nodes counted against application workload subscriptions. OpenShift utilizes the concept of infrastructure nodes to enable customers to isolate infrastructure workloads, thereby preventing the incurring of billing costs against subscription counts and separating maintenance and management tasks. These nodes host only infrastructure components, such as the default router, the integrated container image registry, and the components for cluster metrics and monitoring. These infrastructure machines are not included in the total number of subscriptions required to run the environment.
Automated Red Hat OpenShift Data Protection & Intelligent Recovery
Perform secure application-centric backups of containers, VMs, helm & operators
Use pre-staged snapshots to instantly test, transform, and restore during recovery
Scale with fully automated policy-driven backup-and-restore workflows
OpenShift vs. Kubernetes: Summary of key deployment takeaways
The following table summarizes the deployment experience for both OpenShift and Kubernetes, highlighting key differences in setup, prerequisites, and post-deployment management.
| Feature | OpenShift | Kubernetes |
| Deployment prerequisites | Requires Red Hat CoreOS (RHCOS) for the control plane (immutable infrastructure); workers can use RHEL | Supports any Linux OS (Ubuntu, CentOS, etc.) |
| Deployment method | Uses OpenShift installer (IPI/UPI) with automated, opinionated setup | Manual or tool-driven (kubeadm, kOps, etc.); more flexibility but higher configuration effort |
| Resource Requirements | Due to its integrated services like the registry, router, and monitoring stack, OpenShift has higher minimum resource requirements compared to a vanilla Kubernetes distribution. | Kubernetes can be deployed with significantly lower baseline resource requirements as it only includes core components, leaving the user to add optional services as needed. |
| Authentication | Built-in OAuth server (supports LDAP, GitHub, etc.) | Requires third-party solutions (Dex, Keycloak) or manual configuration |
| Ingress configuration | Includes HAProxy-based router (out of the box) | Needs manual ingress controller setup (NGINX, Traefik, etc.). |
| Add-on and service management | Integrated OperatorHub (preloaded with certified operators) | Requires manual installation of Operator Lifecycle Manager (OLM) |
| Default security posture | Enforces stricter security policies by default (e.g., containers run as non-root), integrated OAuth, and security context constraints (SCCs) | Need to configure security tools (SELinux, RBAC, SCCs) manually |
| Networking | Uses OpenShift SDN (with multi-tenancy support) or OVN-Kubernetes | Requires plugins like Calico, Flannel, or Cilium to be installed separately |
| Registry | Built-in integrated container registry (with image signing) | Requires external registry setup (e.g., Harbor, Docker Registry) |
| Post-deployment upgrades | Automated with OpenShift Cluster Version Operator (CVO) | Manual or tool-assisted (kubeadm, distro-specific methods) |
| User interface | Features a comprehensive and intuitive web console for both developers and administrators | Primarily command-line driven (kubectl); web dashboard is basic and often requires separate installation |
| Licensing and support | Proprietary (Red Hat subscription required) with enterprise support | Free and open-source (community or vendor-supported distributions available) |
OKD: The OpenShift upstream project
OKD, previously known as OpenShift Origin, is the community-driven upstream project that powers Red Hat OpenShift. It packages all the essential components needed to run Kubernetes and optimizes them for continuous application development and deployment.
Unlike OpenShift—a hardened, enterprise-ready product—OKD serves as the innovation hub where the community introduces and tests new features before refining them for enterprise adoption in Red Hat OpenShift. As a result, OKD is generally a few releases ahead of OpenShift, offering early access to cutting-edge capabilities.
OKD is ideal for developers who want to experiment with the latest container orchestration advancements before they reach OpenShift’s stable releases. However, since it lacks Red Hat’s commercial support and security certifications, OKD is best suited for testing, development, and environments where community support suffices.
OKD is where the OpenShift ecosystem evolves, while OpenShift itself delivers a polished, production-grade platform for enterprises.
OpenShift Virtualization Engine
Red Hat also offers another OpenShift variant, the OpenShift Virtualization Engine, which is a specialized edition for running virtual machines. It streamlines VM management by removing unrelated features, giving teams a focused solution for virtualization workloads.
OpenShift virtualization leverages the KVM hypervisor, a virtualization module in the Linux kernel that allows the kernel to function as a type-1 hypervisor. It is a mature technology that major cloud providers use as the virtualization backend for their infrastructure-as-a-service (IaaS) offerings.
OpenShift virtualization uses the KVM hypervisor to allow Kubernetes and KubeVirt to manage the virtual machines. As a result, the virtual machines use OpenShift’s scheduling, network, and storage infrastructure.
OpenShift installation options: self-managed vs. managed services
OpenShift offers multiple deployment models to fit different operational needs, from complete control to hands-off management. Below, we compare self-managed OpenShift (installed on premises or in the cloud) with managed OpenShift services.
Self-managed OpenShift deployment
The OpenShift container platform provides several options when deploying a cluster in any infrastructure. Four primary deployment methods are available, each of which provides a highly available infrastructure; the right choice depends on the specific use scenarios:
- Assisted installer: This is the easiest way of deploying a cluster because it offers a web-based and user-friendly interface and is ideal for networks with access to the public Internet. It also offers smart defaults, pre-flight checks, and a REST API for automation. The assisted installer generates a discovery image, which is used to boot the cluster machines.
- Agent-based installation: This approach requires setting up a local agent and configuration via the command line; it is better suited to disconnected or restricted networks.
- Automated installation: This method deploys an installer-provisioned infrastructure using the baseboard management controller on each cluster host. It works in both connected and disconnected environments.
- Full control installation: This approach is ideal if you want complete control of the underlying infrastructure hosting the cluster nodes. It supports both connected and disconnected environments and provides maximum customization by deploying user-prepared and maintained infrastructure.
The automated installer approach is usually associated with installer-provisioned infrastructure (IPI), while the other methods are usually associated with user-provisioned infrastructure (UPI).
A high-level overview of the cluster installation is shown below.
Managed OpenShift deployments
If you’d rather not deal with infrastructure headaches, Red Hat’s managed OpenShift options are worth considering. Services like OpenShift Dedicated (ROSA) on AWS or Azure Red Hat OpenShift (ARO) handle the heavy lifting—Red Hat and the cloud provider handle cluster setup, maintenance, and security patches. This allows teams to focus entirely on building and deploying applications.
There are some trade-offs, of course. While managed services offer convenience and built-in monitoring, you’ll have less control than running your OpenShift clusters. Pricing depends on your cloud provider, but the reduced operational burden justifies many businesses’ costs.
These solutions work best for companies that need production-ready Kubernetes without building an in-house platform team. The major cloud platforms providing this service are shown in the table below.
| Cloud Provider | Managed OpenShift platform | Management | Billing |
| AWS | RedHat OpenShift on AWS (ROSA) | AWS and RedHat | Billed via AWS |
| RedHat OpenShift Dedicated (OSD) | RedHat | Separate Red Hat subscription/AWS infrastructure bill. | |
| Azure | Azure Red Hat OpenShift (ARO) | Microsoft and RedHat | Billed via Azure |
| GCP | Red Hat OpenShift Dedicated (OSD) | RedHat | Separate Red Hat subscription/GCP infrastructure bill. |
| IBM | Red Hat OpenShift on IBM Cloud (ROKS) | IBM | Integrated with IBM Cloud services. |
Comparing self-managed and managed OpenShift deployments
The following table outlines the differences between managed and self-managed OpenShift deployments.
| Aspect | Self-managed OpenShift | Managed OpenShift |
| Provisioning and infrastructure management | Full infrastructure management required | Provider-managed infrastructure |
| Operations | Manual maintenance and upgrades | Automated maintenance |
| Security | Manual configuration required | Built-in security controls |
| Team requirements | Specialized skills needed | Reduced operational expertise |
| Scalability | Manual scaling processes | Automatic scaling capabilities |
| Cost structure | Higher upfront capital expenses | Predictable operational expenses |
| Deployment speed | Longer deployment timelines | Rapid cluster provisioning |
| Compliance | Self-managed documentation | Provider-maintained compliance |
Step-by-Step: Azure Red Hat OpenShift deployment
The following walkthrough covers the essential stages for setting up an Azure Red Hat OpenShift (ARO) deployment.
Prerequisites
The ARO cluster can be created using the Azure CLI or the Azure portal. Using the Azure console provides a significant advantage for disaster recovery, as it enables customers to obtain an installation configuration file that can serve as a blueprint to recreate an ARO cluster with the same configuration precisely. This capability proves extremely helpful in scenarios where customers, conscious of cost, choose not to maintain a continuously running disaster recovery cluster. For instance, in the event of a DR, a customer can use this saved file to provision a new ARO cluster rapidly and immediately thereafter deploy Trilio. This process can be further accelerated through automation, directed to their application backup target, and then the critical task of restoring workloads can be commenced in a prioritized sequence.
The following guide focuses on creating a cluster using the Azure CLI, which can be installed by following the instructions here.
The ARO deployment requires a minimum of 44 CPU cores to spin up a new cluster. This typically exceeds default Azure quotas for new subscriptions. If the current limits in your Azure account are too low, you’ll need to submit a quota increase request specifically for VM vCPUs before proceeding.
Here’s how those cores get allocated during installation:
- Bootstrap nodes: 8 cores power the temporary bootstrap machine.
- Control plane: 24 cores are dedicated to control plane nodes.
- Worker nodes: 12 cores are for compute workloads.
Once the installation finishes, the bootstrap machine disappears, bringing the core usage down to 36.
By default, the cluster installation creates three control plane nodes and three worker nodes. This is the minimum number of nodes required for the cluster to be supported by Microsoft and Red Hat. Reducing the cluster size to less than this configuration would violate the support agreement. A maximum of 250 worker nodes is supported.
To access the cluster post installation, make sure to download the desired OpenShift CLI (oc) version from here.
By default, the ARO installation uses the Standard D8s_v5 virtual machine size for control nodes and Standard D4s_v5 for worker nodes. You can use the following command to check the available cores for the Standard DSv5 VM family in the East US region:
LOCATION=eastus az vm list-usage -l $LOCATION --query "[?contains(name.value, 'standardDSv5Family')]" --output table
Before creating the ARO deployment, you must set up the networking infrastructure and verify your access permissions. The deployment requires the following:
- Resource group setup: You’ll create a dedicated resource group containing the cluster’s virtual network.
- Required permissions: You’ll need the permissions shown in the table below to deploy the cluster.
| Permission scope | Required roles | Comments |
| Virtual network | Contributor + User Access Administrator OR Owner | Can be assigned at the VNet, resource group, or subscription level |
| Microsoft Entra ID | Tenant member user OR Guest user with Application Admin role | Needed for service principal creation; required if using guest account for cluster tooling operations |
Registering resource providers
Before proceeding for cluster deployment, you need to register the following essential resource providers in your Azure subscription:
- Microsoft.RedHatOpenShift
- Microsoft.Compute
- Microsoft.Storage
- Microsoft.Authorization
If your account has multiple Azure subscriptions, specify the relevant subscription ID:
az account set --subscription
You can check if a particular resource provider is currently registered in your account.
az provider list --query "[?namespace==''].registrationState" --output table
If the resource providers are not registered, you can register them as follows:
az provider register --namespace Microsoft.RedHatOpenShift --wait az provider register --namespace Microsoft.Compute --wait az provider register --namespace Microsoft.Storage --wait az provider register --namespace Microsoft.Authorization --wait
You can then verify that the resource providers have been registered by using the following commands:
az provider list --query "[?namespace=='Microsoft.RedHatOpenShift'].registrationState" --output table az provider list --query "[?namespace=='Microsoft.Compute'].registrationState" --output table az provider list --query "[?namespace=='Microsoft.Storage'].registrationState" --output table az provider list --query "[?namespace=='Microsoft.Authorization'].registrationState" --output table
Downloading a pull secret
The pull secret allows your cluster to access Red Hat container registries and pull images from them. You can download the pull secret for your cluster by navigating to the cluster manager portal. Select Download pull secret and download a pull secret to be used with your ARO deployment.
Although setting the pull secret during cluster creation is optional, it is recommended to include this step.
Creating resource groups
If there are existing virtual networks in your account, you can use them in your ARO deployment. You can also create a new virtual network for your ARO deployment. Here, we’ll create a new virtual network for our ARO deployment. The following variables must be set in the shell environment.
LOCATION=eastus # the location of your cluster RESOURCEGROUP=aro-rg # the name of the resource group #where you want to create your cluster CLUSTER=cluster # the name of your cluster VIRTUALNETWORK=aro-vnet # the name of the virtual network
Resource groups act as logical containers for organizing and managing your Azure services. When creating one, you’ll select a geographic location that serves two purposes:
- Stores metadata about the group
- Becomes the default deployment region for contained resources (unless overridden)
A resource group can be created using the following command:
az group create --name $RESOURCEGROUP --location $LOCATION
The ARO deployment process will automatically create a second, managed resource group to hold the cluster’s infrastructure resources, such as virtual machines, storage, and networking components. Modification or deletion of resources within this managed resource group is not supported, as it can destabilize the cluster.
It’s important to point out that not all Azure regions support Red Hat OpenShift deployments. Make sure you choose from the available regions before creating the resource group.
Creating virtual network and subnets
Create a new virtual network in the resource group created in the previous step:
az network vnet create --resource-group $RESOURCEGROUP --name VIRTUALNETWORK --address-prefixes 10.10.0.0/21
Next, create two empty subnets for control plane and worker nodes.
az network vnet subnet create --resource-group $RESOURCEGROUP --vnet-name $VIRTUALNETWORK --name master-subnet --address-prefixes 10.10.0.0/21 az network vnet subnet create --resource-group $RESOURCEGROUP --vnet-name $VIRTUALNETWORK --name worker-subnet --address-prefixes 10.0.2.0/21
To learn more about networking options in ARO, refer to this guide.
Modifying default options
You can modify the cluster creation command based on the following options:
- Red Hat registry access: Pass the pull secret for accessing images from RedHat registries (–pull-secret @pull-secret.txt).
- Custom domain configuration: You can create a custom domain for your cluster by following the instructions here. Use the “–domain” flag to specify your doman.
- VM size adjustments: The default virtual machine sizes for your control plane and worker nodes are Standard D8s_v5 and Standard D4s_v5, respectively. If you need to create a virtual machine of a different size, you can use the following flags:
- –master-vm-size Standard_D8s_v3
- –worker-vm-size Standard_D4s_v3
- Version specification: You can select a specific version of ARO when deploying your cluster. You can check for available versions first:
az aro get-versions --location
Creating the cluster
Once all these details have been finalized, you can proceed to create your cluster as follows:
az aro create --resource-group $RESOURCEGROUP --name $CLUSTER --vnet $VIRTUALNETWORK --master-subnet master-subnet --worker-subnet worker-subnet --pull-secret @pull-secret.txt
Accessing the cluster
When the cluster deployment is complete, you can connect to the cluster using the default “kubeadmin” user. You can retrieve the cluster console URL and the credentials as follows:
az aro list-credentials --name $CLUSTER --resource-group $RESOURCEGROUP az aro show --name $CLUSTER --resource-group $RESOURCEGROUP --query "consoleProfile.url" --output tsv
To access the cluster using the “oc” command, retrieve the API endpoint using the following command:
apiServer=$(az aro show --resource-group $RESOURCEGROUP --name $CLUSTER --query apiserverProfile.url --output tsv)
You can then login to the cluster API as follows:
oc login $apiServer --username kubeadmin --password
For a comprehensive list of supported configurations and restrictions, refer to the official Azure Red Hat OpenShift support policy.
Checking cluster health
The oc CLI command can be used to verify that the cluster is running and healthy. The following commands provide a basic health check of the cluster’s core components.
First, ensure that the control plane (master) and worker nodes are all running and in a Ready state.
oc get nodes
The following command can be used to check the status of core cluster operators. The cluster operators manage the fundamental components of the OpenShift platform. A healthy cluster will show all operators with AVAILABLE as True, and PROGRESSING as False, with no Degraded operators.
oc get clusteroperators
Check the status of core cluster pods; they should all be in a healthy state.
oc get pods --all-namespaces
Scaling worker nodes
We can add worker nodes to the cluster as per our requirements via machine sets. To determine the current configuration, examine the existing machine sets in the cluster. The following command shows the number of control plane nodes and worker nodes, the associated node types, and the region/zone where they are deployed.
oc get machine -n openshift-machine-api
This scaling operation can be accomplished either through the Command Line Interface (CLI) or directly via the OpenShift Web Console. From the terminal, an administrator can run a command to imperatively scale up a specific machine set to two replicas, thus increasing the total worker node count by one. Because each machine set is typically tied to a particular availability zone, and the initial state is three machine sets with one machine each, scaling one machine set to two machines results in a total of four worker nodes in the cluster. Run the following command to scale the desired machine set to increase the count of worker nodes to four.
oc scale --replicas=2 machineset -n openshift-machine-api
Shutting down the cluster
The cluster can be shut down in order to save costs. To gracefully shutdown the cluster, run the following command.
for node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do oc debug node/${node} -- chroot /host shutdown -h 1; doneDeleting the cluster
When a cluster is deleted, all managed objects are removed. However, resources like the resource group, virtual network, and subnets must be manually deleted.
az login
Select the subscription ID you want to use.
az account set --subscription {subscription ID}Replace the following with the values used to create the cluster, then run the command to delete the cluster.
RESOURCEGROUP= CLUSTER= az aro delete --resource-group $RESOURCEGROUP --name $CLUSTER
Run the following command to delete the resource group.
az group delete --name $RESOURCEGROUP
The ARO deployment process can also be automated using the Azure collection in Ansible. The following playbook can be used as a sample for ARO deployment. Further details about using Ansible for ARO deployment can be found here.
- name: Create openshift cluster
azure_rm_openshiftmanagedcluster:
resource_group: "myResourceGroup"
name: "myCluster"
location: "eastus"
cluster_profile:
cluster_resource_group_id: "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/clusterResourceGroup"
domain: "mydomain"
service_principal_profile:
client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
client_secret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
network_profile:
pod_cidr: "10.128.0.0/14"
service_cidr: "172.30.0.0/16"
worker_profiles:
- name: worker
vm_size: "Standard_D4s_v3"
subnet_id: "/subscriptions/xx-xx-xx-xx-xx/resourceGroups/myResourceGroup/Microsoft.Network/virtualNetworks/myVnet/subnets/worker"
disk_size: 128
count: 3
master_profile:
vm_size: "Standard_D8s_v3"
subnet_id: "/subscriptions/xx-xx-xx-xx-xx/resourceGroups/myResourceGroup/providers/Microsoft.Network/virtualNetworks/myVnet/subnets/master"
- name: Create openshift cluster with multi parameters
azure_rm_openshiftmanagedcluster:
resource_group: "myResourceGroup"
name: "myCluster"
location: "eastus"
cluster_profile:
cluster_resource_group_id: "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/clusterResourceGroup"
domain: "mydomain"
fips_validated_modules: Enabled
service_principal_profile:
client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
client_secret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
network_profile:
pod_cidr: "10.128.0.0/14"
service_cidr: "172.30.0.0/16"
outbound_type: Loadbalancer
preconfigured_nsg: Disabled
worker_profiles:
- name: worker
vm_size: "Standard_D4s_v3"
subnet_id: "/subscriptions/xx-xx-xx-xx-xx/resourceGroups/myResourceGroup/Microsoft.Network/virtualNetworks/myVnet/subnets/worker"
disk_size: 128
count: 3
encryption_at_host: Disabled
master_profile:
vm_size: "Standard_D8s_v3"
subnet_id: "/subscriptions/xx-xx-xx-xx-xx/resourceGroups/myResourceGroup/providers/Microsoft.Network/virtualNetworks/myVnet/subnets/master"
encryption_at_host: Disabled
Backing up OpenShift using Trilio
OpenShift hosts mission-critical applications, configurations, and persistent data. Losing any of these can lead to severe downtime, data corruption, or compliance violations. Backups ensure disaster recovery, migration flexibility, and protection against accidental deletions, cyberattacks, or cluster failures.
Several tools are commonly used for backing up OpenShift, each with limitations. One of the more popular open-source solutions is Velero, which provides basic backup capabilities but lacks application-consistent snapshots. Commercial solutions such as Kasten K10 offer policy-driven backups but have limited OpenShift-specific optimizations and can be complex to deploy in multi-tenant environments. Azure Kubernetes Service (AKS) Backup also offers a native, Azure-focused backup tool. Still, it is often restricted by being tied only to Azure services and lacking the granular scope needed for complex OpenShift workloads. There is also the option to use native storage snapshots from cloud providers, but they only capture disk states without Kubernetes object consistency, requiring additional manual recovery steps.
This is where solutions like Trilio come into play. Trilio is engineered as an enterprise-grade platform designed to address modern organizations’ complex data protection and disaster recovery needs. Unlike generic backup tools, Trilio ensures proper application-consistent backups, which are critical for databases and stateful workloads. It natively supports OpenShift Operators and custom resources, enabling the seamless backup and recovery of Helm releases and CRDs. With granular recovery options, administrators can restore individual namespaces or entire clusters while avoiding vendor lock-in.
When used with a solution such as ARO, Trilio can offer the following capabilities:
- Automated backups: Trilio enables scheduling point-in-time backups and offers flexible recovery options.
- Continuous restore: Trilio’s continuous restore capabilities enable building effective disaster recovery strategies. Regardless of your cloud provider, they ensure that applications can be quickly restored after a disaster. Trilio’s innovative architecture allows multiple primary application clusters to be continuously converted to one dedicated disaster recovery (DR) cluster. This approach cuts down on infrastructure costs compared to a traditional one-to-one primary-to-DR cluster model.
- Migration and platform portability: Trilio empowers organizations to seamlessly migrate applications between diverse environments, such as Azure Red Hat OpenShift and Red Hat OpenShift Service on AWS (ROSA), or even on-premises clusters. For instance, customers can easily migrate applications back to on-premises infrastructure if cloud costs become prohibitive.
- Multi-cluster management: The integration of Trilio with Red Hat Advanced Cluster Management for Kubernetes (RHACM) facilitates the definition and orchestration of policy-driven data protection across a diverse range of Kubernetes deployments, including hybrid, multi-cloud, and edge environments.
The following table compares Trilio against some of the other backup solutions.
Feature | Traditional backup solutions (AKS Backup, Velero, Kasten) |
Trilio |
Application Scope and Consistency |
These tools offer limited scope, often missing essential application components like Helm charts, VMs, operators, and user-defined labels. Velero, a popular open-source option, notably lacks native application-consistent snapshots, posing risks to stateful workloads and databases. |
Offers significantly more granular backup options, including native support for namespaces, user-defined labels, stateful operators, and complex Helm charts, ensuring complete application-consistent recovery. |
Storage Flexibility & Data Sovereignty | Solutions are often restricted to the cloud provider’s native storage tiers (like AKS Backup) or require significant manual configuration for integrating external or diverse S3 providers for cost-effective tiering and achieving data sovereignty. | Supports a wide variety of storage targets and external S3 providers, enabling greater data sovereignty, cost optimization through flexible tiering, and avoiding native cloud storage restrictions. |
Disaster Recovery & Vendor Lock-in |
Native tools like AKS Backup are limited to the same cloud/region, offering no support for cross-cloud migrations or true hybrid-cloud DR. The resulting vendor lock-in makes moving applications to different platforms complex and resource-intensive. |
Facilitates true cross-cloud migrations and robust Disaster Recovery (DR) capabilities. Trilio avoids vendor lock-in by providing portable backup data and flexibility to restore across different cloud providers or on-premises environments. |
Enterprise Management & Identity |
Management interfaces are often fragmented across native cloud consoles and complex Kubernetes tooling, lacking a unified, advanced, multi-cluster, or multi-tenant user interface. Additionally, identity support can be restricted (e.g., AKS Backup only supporting Managed System Identity). |
Provides an advanced multicluster, hybrid cloud, multi-tenant UI for centralized management and visibility. It also offers support for multiple identity management methods, catering to diverse enterprise security policies. |
Best practices for deploying OpenShift
Here are some of the recommended practices for deploying OpenShift:
- Engage in proper capacity planning: Thoroughly assess and allocate resources before cluster deployment. Sufficient compute and memory resources must be allocated to cluster nodes so that workloads are not adversely affected. Worker node capacity should account for current application needs and projected growth, typically with a 20-30% buffer. High-performance storage like NVMe is strongly recommended for etcd backends to maintain cluster responsiveness.
- Implement strong network security: Build in strict network segmentation from day one. Isolate control plane traffic from worker nodes and external connections using dedicated VLANs or network policies. Restrict traffic between namespaces and applications through fine-grained network policies. Place ingress controllers in a secured perimeter zone with mandatory TLS encryption for all external traffic.
- Optimize for performance: When deploying OpenShift in on-premise networks, carefully configure API and ingress Virtual IPs (VIPs) on low-latency networks (<2 ms response time). Conduct load testing to validate VIP configurations under peak traffic conditions. Distribute ingress endpoints across availability zones for redundancy and balance API server loads across control plane nodes.
- Protect data: Implement a comprehensive backup strategy before moving to production. This should include hourly etcd backups and daily application data snapshots stored across geographically separate locations.
- Ensure operational preparedness: Maintain detailed runbooks for critical operations, including certificate rotations, node replacements, and emergency recovery procedures.
Learn How To Best Backup & Restore Virtual Machines Running on OpenShift
Conclusion
Creating OpenShift clusters requires careful planning, whether deploying managed services like ARO/ROSA or self-managed installations. For quick deployment with reduced operational overhead, managed OpenShift solutions offer the fastest path to production with built-in maintenance and scaling. Self-managed clusters provide greater customization but demand more expertise for setup, upgrades, and day-to-day operations.
The deployment approach should align with your team’s skills and workload requirements: managed services for teams wanting to focus on applications and self-managed for those needing granular control. Regardless of the method, all OpenShift deployments benefit from proper resource allocation, network segmentation, and backup strategies.
Like This Article?
Subscribe to our LinkedIn Newsletter to receive more educational content