Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

OpenShift and Kubernetes offer various primitives and extensions that simplify the initial deployment of applications, including pods, deployments, and services. However, running applications on these cloud-native platforms in production presents its own unique set of challenges, which appear after the first deployment. These include Day 1 and Day 2 operations of the applications and services, such as post-install configuration, monitoring, upgrading, and uninstalling workloads, particularly for stateful workloads like databases, enterprise applications, and messaging systems. 

Generally, the operational procedures for these operations are dispersed among DevOps and SRE teams as well as ad hoc scripts and manually executed commands. Such an approach to managing the application lifecycle is error-prone and neither scalable nor efficient. 

In this article, we explore what operators are and how they help automate Day 1 and Day 2 operations by codifying operational knowledge into the application package.

Summary of key OpenShift operator concepts

The following table outlines key concepts and terminologies necessary to understand operators and how they work.

Concept

Description 

Operator

Application extensions that codify operational knowledge into the application

Operand

An instance of the application or workload an operator manages

The Operator Framework

An open-source toolkit to build, manage, and share operators

The Operator SDK

A development framework that simplifies the building, testing, and packaging of operators

Operator Lifecycle Manager (OLM)

A management framework that simplifies the installation and management of operators at scale

OperatorHub

A central registry to discover and install operators

Custom resource definitions (CRDs)

A blueprint for creating custom resources that extend native APIs to manage complex applications

Cluster service version (CSV)

A specific version of an operator

Automated Application-Centric Red Hat OpenShift Data Protection & Intelligent Recovery

Operators

Day 1 and 2 operations in Kubernetes and OpenShift environments present operational challenges that extend beyond a simple deployment using a YAML file. They involve application-specific operational logic and knowledge that relies heavily on human operators.

Let’s say you need to install and manage Argo CD, a popular GitOps continuous delivery tool for cloud-native platforms such as Kubernetes and OpenShift, on a cluster. The usual process involves installing Argo CD using the manifest file or Helm chart, which does simplify the initial installation process.

However, a vanilla installation is far from production-ready. It requires ongoing operations, such as scaling the cluster to handle demand, version upgrades, backing up the cluster, and monitoring. All of this work necessitates application-specific operational knowledge that is often documented in scripts, runbooks, or the minds of administrators and must be executed manually. This adds stress during critical scenarios, such as disaster recovery, security incidents, or maintenance. These processes are prone to errors and cannot be scaled. Knowledge transfer is also one of the challenges teams face when administrators change.

Operators solve these problems by codifying operational knowledge into the application lifecycle. An operator extends the platform’s core APIs using custom resource definitions (CRDs) and utilizes application-specific controllers to manage the application’s lifecycle autonomously. In the example of Argo CD, the operator provides CRDs that abstract many of the operational complexities of managing an Argo CD installation.

The following image shows the Argo CD operator install page in the Red Hat OpenShift OperatorHub. The operator provides the Argo CD, Application, ApplicationSet, AppProject, Argo CDExport, and the NotificationsConfig APIs. These are the APIs required to deploy and manage an Argo CD installation. The Update channel specifies the channel from which to receive updates. As seen in the image below, the operator only has one channel called alpha. The operator will be made available to all namespaces in the cluster by setting the Installation mode to All namespaces on the cluster, and is installed in the openshift-operators namespace. The Update approval mode is set to Automatic.

Argo CD operator

Argo CD operator

Operands

The application or workload that an operator manages is called an operand. In the case of the Argo CD operator, when you create an Argo CD cluster, it will create various operand resources that host the workloads, such as argocd-server and  argocd-notifications-controller. 

Operator scope

OpenShift operators are either cluster-scoped or namespace-scoped: 

  • Cluster-scoped operators watch and manage resources across all namespaces in an OpenShift cluster. They require cluster-wide RBAC permissions, which are assigned using cluster roles and cluster role bindings. For example, cert-manager is an operator that can be cluster-scoped. Similarly, cluster operators (oc get co), also called platform operators, are cluster-scoped operators. They offer centralized management and simplified deployment, but carry a higher impact in case of issues. For example, a privileged escalation on a cluster-scoped operator can affect the entire cluster due to cluster-wide permissions. Similarly, bugs or misconfigurations have a wider effect across all projects.
  • Namespace-scoped operators, on the other hand, watch and manage resources within a predefined namespace (or projects in OpenShift). They require roles and role bindings for role-based access control (RBAC) limited to that namespace. Namespace-scoped operators provide isolation, flexibility, and security; for example, an upgrade, a security incident, or a failure will be limited to the namespace.

Automated Red Hat OpenShift Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

The Operator Framework

The Operator Framework is a set of tools to build, manage, and discover operators. 

Operator SDK

The Operator SDK is a framework that enables building, testing, and packaging operators. It provides high-level abstractions, scaffolding, and code generation tools to bootstrap operators quickly. As a developer, you focus on encoding application-specific operational knowledge, such as upgrade strategies, scaling logic, and backup procedures, into a custom controller that leverages the controller runtime library to handle the underlying reconciliation. The SDK incorporates patterns and best practices that help create intelligent, automated, and production-ready operators. 

You can develop operators using Go, Helm, or Ansible. 

Operator LifeCycle Manager

Operators codify operational knowledge of applications, but when deploying many such operators across multiple clusters, they can quickly create new operational challenges. Tracking different versions of operators across various clusters, resolving dependencies between operators with shared components, and ensuring consistent installation are among these challenges. The Operator LifeCycle Manager (OLM) is an open-source tool that addresses this issue using a management framework. The framework enables:

  • Catalog-based discovery
  • Automated dependency resolution
  • The use of upgrade channels to receive updates from, and approval workflows
  • Automatic over-the-air (OTA) updates to operators and the cloud-native applications managed by them
  • Declaring operator dependencies using the OLM packaging format

OperatorHub

OpenShift comes with an embedded web console, called OperatorHub, for discovering and installing operators. You can download, install, and subscribe to an operator with a single click. These operators are OLM-ready. 

The OperatorHub supports the following catalogs:

  • Red Hat operators that are certified, packaged, shipped, and supported by Red Hat.
  • Certified operators, which are products from ISVs and supported by them
  • Community operators that are products maintained by the wider community (these do not include any official support)
  • Custom operators that you can add yourself

The following image displays a screenshot of the OpenShift Operator Hub web console, featuring Red Hat operators.

OpenShift operator web console

OpenShift operator web console

Note that not all operators visible in https://operatorhub.io/ are available in the Red Hat OpenShift OperatorHub web console. 

Operator registry

An OpenShift installation includes an operator registry that acts as a catalog of available operators to the cluster. It includes metadata such as version, dependencies, and channels, including CRD and CSV information used by the OLM for the automated installation and management of operators.  Think of it as YUM/DNF repositories for the Red Hat operating system. The OperatorHub leverages the registry to display available operators. 

OLM architecture

The Operator Lifecycle Manager (OLM) is made up of two operators:

  • The OLM operator 
  • The catalog operator 

These two operators manage a bunch of CRDs that form the basis of the OLM framework. Let’s take a look at them.

Subscription

A subscription is used to specify the operator to be installed and the catalog source for it. It also includes information about the channel to use and the install plan approval method. 

Catalog source

The catalog source is a repository that contains metadata (CSVs, CRDs, and packages) about available operators. The OLM uses this to discover operators, including upgrades to installed operators. The catalog source uses the OpenShift Registry API.  

Operator group

An operator group resource is used to control and manage permissions to one or more namespaces for an operator. The OLM uses the information in it to generate RBAC permissions for the operator. If the CSV of an operator is in the same namespace as the operator group, it is considered a member of that group. 

Install plan

When you create a subscription for an operator, the OLM generates an install plan that contains a list of resources required for it, like custom resource definitions (CRDs), roles, and deployments. Upon approval of the install plan (either automatic or manual), OLM creates these resources.

Cluster service version (CSV)

The CSV is a YAML manifest of both application and operator metadata that helps OLM to run a specific version of the operator in a cluster. It includes informational metadata, such as the name, description, version, repository link, and labels. It also includes operational information about the operator, such as the CRDs it manages and depends on, RBAC rules, cluster requirements, and the install strategy, which is used by OLM to create resources required to run the operator. 

The following image shows the OLM workflow.

OLM lifecycle workflow

OLM lifecycle workflow

OLM vs. operators vs. Helm charts

The following table summarizes the differences between OLM, operators, and Helm charts.

 OLMOperatorHelm
Use caseOLM manages the lifecycle of operators.An operator manages the application’s lifecycle, including its operational aspects.Helm is used to package applications for repeatable deployment.
How it worksUses the catalog operator and the OLM operator to automate the lifecycle of operators. Extends Kubernetes APIs to create CRDs that manage an application and its operations.Packages and templates manifests (YAML) for application deployment.
Dependency managementOLM provides complete automated dependency management for operators, including version management to ensure compatibility and avoid conflicts. An operator provides dependency management for the application only. Operator dependencies, like prerequisite operators, must be installed and managed manually.Helm provides basic dependency management using Chart.yaml without dependency resolution, conflict management, or version negotiation.
CRD lifecycle managementFull lifecycle support for the management of OLM CRDs.Full lifecycle support for management of application CRDs.Limited support for managing CRDs. For example, removing CRDs is a manual process. 
Multi-tenancy and RBACOperator groups provide scoping of operators to specific namespaces along with catalog-level access controls.Standalone operator management requires a manual approach for multi-tenancy and RBAC.Helm does not provide native multi-tenancy or RBAC capabilities. 
CustomizabilityN/AOperators offer greater customization, allowing for the programming of any operational task. They are like installing an application from source code. Helm-based deployments can only be configured but only to the extent of values supported when creating the chart. Using Helm is similar to installing an application from a package.
Development effortMaking an operator compatible with OLM, although not difficult, is still an additional effort beyond creating operators.The high degree of customizability comes with increased development effort when creating operators.Helm charts are simpler to create than operators, primarily using YAML. 

Common operator tasks / working with operators

In this section of the article, we will learn how to install and manage the Argo CD operator from the Red Hat OpenShift OperatorHub web console. 

Installing an operator

Navigate to Operators > OperatorHub and search for Argo CD. Select it to load the details modal and click Install to begin.

Argo CD operator in the OpenShift web console

Argo CD operator in the OpenShift web console

In the next step, review the settings, including the Update channel to subscribe to, the Version you want to install, the Installation mode, Namespace, and whether to update automatically or manually. The page also displays all the CRDs (or APIs) that this operator will provide. Click Install to begin the operator installation process.

 

Argo CD operator

Installing the Argo CD operator

The installation process takes a few minutes to complete.

Viewing installed operators

To view installed operators, navigate to Operators and then select Installed Operators.

List of installed operators in the OpenShift web console

List of installed operators in the OpenShift web console

You can also list them using the following command:

oc -n openshift-operators get subscriptions

Installing the application (operand)

Now that Argo CD is installed, let’s see how to create an Argo CD cluster by adding an operand. The following YAML file creates a basic Argo CD cluster:

apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: argocd
  namespace: argocd
  labels:
    example: basic
spec: {}

Create the cluster using the oc command as follows:

oc apply -f argocd.yaml

This creates multiple resources required for deploying Argo CD, like deployments, secrets, config maps, and services.

Uninstalling an operator

To uninstall an operator, click the vertical ellipsis (three dots) next to the operator and select Uninstall. This will remove the operator and operator pods. It does not remove the CRDs and resources (operands) managed by it.

Installing the Argo CD operator

Uninstall an operator from the OpenShift web console

Click Uninstall again on the confirmation screen.

Confirm the uninstall of the operator

Confirm the uninstall of the operator

You can also uninstall it using the following oc commands:

oc -n openshift-operators get subscription,csv

oc -n openshift-operators delete subscription argocd-operator

oc -n openshift-operators delete csv argocd-operator.v0.15.0

OLM benefits for enterprises

Enterprises run everything at scale, which means operator management at scale, emphasizing the need for centralized operator management, governance, and lifecycle controls. Deploying standalone operators essentially reintroduces the manual operational burdens that the Operator pattern was exactly meant to solve. Tasks like manually tracking versions, resolving inter-operator dependencies, and executing complex rollbacks are all brought back into the human workload, undermining full automation.

OLM considerably simplifies lifecycle automation at enterprise scale by allowing operator subscriptions with continuous reconciliation. For example, certified operators can be automatically upgraded for patch releases and set to manual approval for major releases across the entire fleet through a policy. For enterprises running multiple environments of an application, approval channels can be configured separately for each environment. OLM handles operator upgrades based on approved strategies, and the operator manages the application upgrade process itself. Similarly, when uninstalling, OLM ensures proper cleanup, removing CRDs and resources while identifying any operators that depend on these CRDs to prevent issues. This is a significant advantage for enterprises operating multiple teams and deploying and managing hundreds of applications.

OLM supports operator-specific governance that benefits enterprise platform teams. Using operator groups, you can define specific namespaces that an operator can manage, preventing resource creation in another team’s namespace. This also facilitates self-service for platform teams without the risk. Additionally, platform teams can define trusted catalog sources for operators; for example, you can restrict production deployments to certified operators only. 

For enterprises running multiple OpenShift clusters, using Red Hat Advanced Cluster Management (ACM)—a multi-cluster management solution for Kubernetes and OpenShift—and operator policies enables platform teams to define consistent environments. For example, using an operator policy, you can define that all production OpenShift clusters run version 0.15.0 of Argo CD. A particular use case for maintaining consistency in operator versions arises when setting up a disaster recovery site or a cluster that must match the primary cluster. ACM is itself deployed and managed as an operator. 

Trilio for OpenShift operator backup and recovery

While OLM is great for operator lifecycle management, especially installations, upgrades, and uninstallations, enterprises often encounter scenarios that require restores or recovery. This could be due to failed operator installations, failed upgrades that have affected the workload or cluster state, a security incident, or misconfigurations. Trilio for Red Hat OpenShift, available through OperatorHub, supports application-consistent backups that include all necessary resources, including the ability to backup and restore Operators and CRDs. It provides you with the flexibility to choose the scope of the backup, allowing you to customize what is backed up. Trilio’s native integration with OpenShift enables seamless backup and recovery.    

Learn How To Best Backup & Restore Virtual Machines Running on OpenShift

Conclusion

OpenShift operators are a key part of an autonomous OpenShift cluster, providing capabilities beyond orchestration and hosting. They extend OpenShift through CRDs and controllers, embedding operational knowledge into the clusters. The Operator Framework and the OLM standardize the process of building, installing, configuring, upgrading, and uninstalling operators. Cluster service versions, subscriptions, and operator groups enable OpenShift to treat operators as first-class citizens within the platform while providing security, stability, and compliance. OperatorHub provides a marketplace of prebuilt operators on enterprise applications, databases, messaging, infrastructure, security, and more.

Table Of Contents

Like This Article?

Subscribe to our LinkedIn Newsletter to receive more educational content

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.