Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

Kubernetes continues to be the most popular platform for orchestrating containerized applications and managing microservices at scale. By encapsulating each microservice within a container and deploying it as a pod, Kubernetes ensures isolation, resource efficiency, and fault tolerance. These pods are then distributed across worker nodes in your cluster while the control plane manages the overall orchestration.

However, as simple it may sound, harnessing the true power of Kubernetes requires more than just understanding its basic building blocks. To achieve optimal performance, security, and reliability, it’s essential to consider the core functionalities of the platform and how adopting best practices can help. 

In this article, we discuss the core aspects of managing Kubernetes and how adopting best practices can help you build, deploy, and manage efficient applications.

Summary of key Kubernetes best practices

AreaBest practice
Resource managementSegregate different environments into namespaces and configure resource limits.
Pod health check probesConfigure health probes to check pods’ readiness for traffic and auto-restart failed pods. 
Config maps and secretsDecouple configurations from application code and store secrets with a higher level of security.
Network, application, and cluster securityImprove your security posture by configuring network and role-based access control, encrypting sensitive data on the wire and at rest.
CIS Kubernetes benchmarksValidate the security configurations with the industry standards provided by CIS using tools like kube-bench.
Centralized monitoringDetect, diagnose, and resolve problems with centralized logging of your distributed architecture.
CI/CD and GitOpsReduce the risk of human error, accelerate development, and ensure consistency with automation.
Static code analysisScan the CI/CD code and K8s manifests for exposure of credentials or misconfigured permissions using automated tools like Terrascan.
Backups and disaster recoveryMaintain appropriate backups and implement a disaster recovery solution. In case of loss of data or infrastructure, you can return to your production state with minimal loss of time and data.

Namespace management

Kubernetes (K8s) provides a mechanism called namespaces for logically segregating resources or environments within a single cluster. This allows you to partition your cluster into separate environments—such as development, staging, and production—or organize it by teams. Namespaces also support resource quotas and access policies, enabling you to manage resource usage and minimize the risk of interference between environments.

Resources within a namespace are uniquely identified by their name within that namespace, preventing naming conflicts across the cluster. Red Hat OpenShift builds on this concept with OpenShift Projects, which extends namespaces by adding more metadata and context to enhance multitenancy. Projects are essentially built upon resources by associating namespaces with additional attributes like quotas, limits, and network policies. 

Consider leveraging projects for fine-grained role-based access control (RBAC) across multiple namespaces. For complex deployments with multiple teams and environments, this means you can achieve greater control and finer isolation of resources.

Resource requests and limits

Resource management is important in any cloud environment to ensure that your applications run smoothly while managing costs. It is recommended to define restrictions for the K8s resources to optimize the utilization of the K8s nodes and prevent one application from starving resources for other pods.

K8s allows you to define CPU and memory quotas. These quotas can be defined per container or at the namespace level. You can define two ranges of quotas, requests, and limits for the resources. Requests are used by the K8s cluster to schedule containers on nodes, and this is the minimum guaranteed resource for the containers. Limits are the maximum allowed resource usage, and containers cannot exceed this limit. 

Example

The following manifest file creates an Nginx pod with a request quota of 0.5 CPU and 256 MiB RAM and limits of 1 CPU and 512MiB.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: webserver-namespace
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1"

Automated Kubernetes Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

Pod health check probes

K8s supports configuring probes to check the health status of the containers. It is recommended that these health probes be used so that the K8s cluster can detect when the pods are ready to receive traffic after creation or can restart the non-responsive probes. 

There are three types of probes available for containers.

Liveness probes

K8s can monitor container health to check that these are still alive. In case of a long running time, a container can get stuck for a variety of reasons and be unable to serve applications. K8s can monitor the pod’s health periodically via a liveness probe and restart the container in case the health check fails.

Readiness probe

When you launch a container, it might not become ready for service immediately due to some initial time-consuming tasks. Until these initial tasks are complete, the container cannot accept application traffic. K8s uses a readiness probe to determine if the container can provide service as an application endpoint. The readiness health is monitored periodically, and the container can be removed from the service endpoint if the health check fails.

Startup probe

This is helpful for containers that have a long startup time. If defined, it delays the liveness and readiness probes until the startup probe check is successful. 

Example

This example creates a web application container with health probes to monitor its status using the following criteria: 

  • Startup probe: To ensure that the pod starts correctly, an HTTP request will be sent to /startup on port 8080, beginning 20 seconds after the probe starts.
  • Readiness probe: Once the pod is started, an HTTP request will be sent to /ready to check that it is ready to receive traffic.
  • Liveness probe: Throughout the pod’s lifecycle, periodic HTTP requests will be sent to /healthz to confirm that the pod is still functioning.

Configure the following manifest, where if any of these probes fail consecutively beyond the defined failure threshold, Kubernetes restarts the pod to maintain application availability.

apiVersion: v1
kind: Pod
metadata:
  name: webapp-pod
spec:
  containers:
  - name: webapp-container
    image: my-application:latest
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      initialDelaySeconds: 20
      periodSeconds: 10
      failureThreshold: 5

Using ConfigMaps and secrets

ConfigMaps

In Kubernetes, it is recommended to store configuration data separate from application logic. K8s provides storage of configuration data as key-value pairs in ConfigMaps. This allows flexible management of configurations without changing the applications and redeploying container images.

Example

Here’s how to create a ConfigMap to provide database host and port configuration details as key-value pairs.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database_host: dbhost.example.com
  database_port: "3306"

Now apply the ConfigMap:

kubectl apply -f configmap.yaml

The ConfigMap below stores database host and port information, which will be injected into the pod as environment variables.

apiVersion: v1
kind: Pod
metadata:
  name: app-pod
spec:
  containers:
  - name: app-container
    image: my-app:latest
    env:
    - name: DATABASE_HOST
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_host
    - name: DATABASE_PORT
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_port

Secrets

In K8s, secrets are used to store sensitive information like passwords, API tokens, and SSH keys. This keeps your sensitive information separate from the application code so that these don’t get exposed. K8s optionally supports encryption at rest for the secrets.

Example

The secrets support two types of maps: data and stringData. The data map requires the values to be encoded in base64 format.

apiVersion: v1
kind: Secret
metadata:
  name: app-credentials
type: Opaque
data:
  username: bXl1c2VybmFtZQ==
  password: bXlzZWN1cmVwYXNz

Here’s how to use kubectl to apply the secret above and use this in a pod via environment variables:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: mycontainer
      image: myimage
      env:
        - name: USERNAME
          valueFrom:
            secretKeyRef:
              name: app-credentials
              key: username
        - name: PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-credentials
              key: password

Watch this 1-min video to see how easily you can recover K8s, VMs, and containers

Networking and security

In live production deployments, controlling traffic flow across the K8s cluster and between pods and the external networks is highly recommended. You should only allow IPs and ports that are required for the application to function and block everything else to reduce the attack surface. 

Network access control

To apply network access control, you require a K8s network plugin (like Calico or Cilium) that supports network policies.

Example

Create a policy to deny all ingress and egress traffic from pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: default
spec:
  podSelector: {}  # Selects all pods in the namespace
  policyTypes:
    - Ingress
    - Egress

Create a policy to allow ingress traffic from pods labeled frontend to talk to pods labeled backend:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: backend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: frontend

In addition to applying network policies, you also need to make sure the communication over the network is encrypted to secure it from prying eyes. You should use TLS for your service connections and also encrypt the ingress and egress traffic.

Create a TLS secret:

kubectl create secret tls my-tls-secret \
  --cert=path/to/tls.crt \
  --key=path/to/tls.key \
  --namespace=default

Create an ingress resource that uses TLS for communication:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  namespace: default
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  tls:
  - hosts:
    - yourdomain.com
    secretName: my-tls-secret
  rules:
  - host: yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Role-based access control (RBAC)

In any infrastructure operations, it is highly recommended that each user or component be restricted to the minimum level of required permissions. In K8s, RBAC is used to define and control access to resources within the cluster. You can define which user or application has what type of access to K8s resources like pods or services. You can define a role that associates permissions within a specific namespace, and you can create a ClusterRole that defines cluster-wide permissions. 

Example

Create a role that allows read-only access to pods within a specific namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: prod-env
  name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Create a ClusterRole that has read-only permissions across multiple namespaces.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Bind the read-only role to a specific user for a specific namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: [email protected] 
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Assign a cluster-wide read-only role to a specific user:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read-pods-cluster-wide
subjects:
- kind: User
  name: [email protected]
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-pod-reader
  apiGroup: rbac.authorization.k8s.io

Learn about the features that power Trilio’s intelligent backup and restore

Security context

K8s provides security contexts that you can use to restrict containers, kernel capabilities, and access to the filesystem. You can use this to limit privilege escalation and force the containers to run as specific non-root users. You can also define specific permissions for the mounted filesystems.

Example

Create a container to run with specific non-root user and group permissions, disallow privilege escalation, ensure that the file systems are mounted with ownership of a specific group, and check that the root file systems are mounted as read-only. 

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 2000
    fsGroup: 3000
  containers:
  - name: secure-container
    image: my-secure-app:latest
    securityContext:
      runAsNonRoot: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true

OpenShift security context

RedHat OpenShift enforces a high level of security by limiting pods to the fewest privileges possible. The platform prevents containers from protected functions like root access and shared file system access. The OpenShift Restricted Security Context Constraints (SCC) ensure that containers are run using arbitrarily assigned user IDs, root filesystems are mounted as read-only, and the container network interface is isolated from the host system. 

Application security

When using container images in K8s, you should always be using trusted sources like the official repository from the Docker Hub or your own private repo. Alternatively, you can also download images from verified publishers like Nginx, VMware, or HashiCorp that publish high-quality, secure images.

You should be using the most recent version of images that include the latest security patches. For better consistency across the application environment, pin specific versions of the image and avoid the latest tag.

Example

Create an Nginx deployment using the official docker hub repository of Nginx and a specific mainline release version. 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        # Image from the official Docker Hub NGINX repo
        image: nginx:1.27.2  # Specific version tag
        ports:
        - containerPort: 80

CIS Kubernetes benchmarks

The Center for Internet Security (CIS) provides security guidelines and benchmarks for operating systems, applications, and cloud operations. These controls are the recommended security practices, created using real-world experiences and consensus review of a global community of subject matter experts. 

Adopting the recommendations of the CIS Benchmark for Kubernetes is an essential step in verifying the security of the K8s environment. The benchmark provides an extensive list of checks and recommended practices for API server configuration, etcd configuration, kubelet and node security, network policies, and pod security.

Although running tests manually is one option, the CIS recommended checks can be automated using open-source utility tools like Kube-Bench. Kube-Bench can run the checks documented in the CIS K8s benchmark and provide a pass/fail report for each individual configuration check. 

A sample CIS check run using Kube-Bench (Source)

A sample CIS check run using Kube-Bench (Source)

Deployments and replica sets

In a K8s live production environment, you need to ensure the availability of pods and apply update strategies to manage the life cycle of your applications. Using deployments, you can define rolling updates and rollbacks, allowing you to modify applications without downtime. With replica sets, you can maintain a stable set of pod replicas at any time. You can define a sufficient number of replicas required for the efficient running of your application and maintain availability in case of node failure.

Example

Here, we will create a nginx deployment with a rolling update strategy and three replicas. The deployment will ensure that only one pod is unavailable during updates and will create one additional pod during updates to reduce downtime:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3  # Number of replicas
  selector:
    matchLabels:
      app: nginx
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # pods unavailable during update
      maxSurge: 1        # additional pods above number of replicas
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.27.2  # NGINX image
        ports:
        - containerPort: 80
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10

Logging and monitoring

K8s is a complex platform with lots of moving components. The system can generate hundreds of thousands of logs, events, and metrics. Centralized observability and monitoring solutions are highly recommended for proper visibility, diagnosis, and troubleshooting.

You can use a log collection tool like Fluentd to collect logs from K8s, transform them into your required format, and ship them to the Elasticsearch backend. Elasticsearch is a real-time distributed and scalable search engine that can index and search through a large volume of data. In addition to the above, Kibana can be deployed alongside Elasticsearch for visualization and dashboards.

For collecting metrics, you can use Prometheus along with Grafana for visualizing and alerting based on these metrics.

CI/CD and GitOps

In large and complex infrastructure operations, it is highly recommended that automation tools and version control systems be used. With automation, you can reduce the risks of human errors, improve operational efficiency, and ensure consistent deployments across your infrastructure. With version control systems, you can enforce best practices of change management, tracking, auditing, approval, and rollback. 

You can use version control platforms like GitHub or GitLabs to manage your YAML manifests. To execute your pipeline deployments, you can use Jenkins as a general-purpose automation tool with a rich set of plugins to integrate with different platforms. Another option is ArgoCD, which has native integrations with Git and Kubernetes.

OpenShift Pipelines

Redhat OpenShift Pipelines is a cloud-native CI/CD system built for decentralized teams working on microservices architecture. Organizations already using OpenShift can leverage Pipelines because it is natively integrated into Kubernetes. This makes it easier to manage pipelines, as they are handled using the same Kubernetes APIs and tools without the need for external CI/CD tools.

Static code analysis

When you have a large team working on multiple aspects of your application and deploying various infrastructure components using infrastructure as code (IaC), it is highly recommended to use automated security scanning tools to perform code analysis. With multiple engineers with different skill levels, there are bound to be mistakes, like exposing secrets in plaintext, misconfigured permissions, and missing RBAC controls.

You can use open-source tools like Terrascan to perform automated code analysis. Terrascan includes over 500 policies to scan your code against standards like CIS Benchmarks. You can integrate Terrascan with your CI/CD process to automate the code analysis process.

Example

Consider the following deployment manifest for Apache:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: apache-deployment
  labels:
    role: webserver
spec:
  replicas: 2
  selector:
    matchLabels:
      role: webserver
  template:
    metadata:
      labels:
        role: webserver
    spec:
      containers:
      - name: frontend
        image: httpd
        ports:
        - containerPort: 80

Scanning this with Terrascan would list the policy violations detected in the code:

$ terrascan scan -i k8s -f apache-deployment.yaml 
Violation Details -
    
 Description    : CPU Limits Not Set in config file.
 File           : wordpress-deployment.yaml
 Line           : 28
 Severity       : MEDIUM
 ------------------------------------------------------------------
 
 Description    : No readiness probe will affect automatic recovery in case of unexpected errors
 File           : apache-deployment.yaml
 Line           : 28
 Severity       : LOW
 ------------------------------------------------------------------
 
 Description    : Apply Security Context to Your Pods and Containers
 File           : apache-deployment.yaml
 Line           : 28
 Severity       : MEDIUM
 ------------------------------------------------------------------
 
 Description    : Image without digest affects the integrity principle of image security
 File           : apache-deployment.yaml
 Line           : 28
 Severity       : MEDIUM
 ------------------------------------------------------------------
 
 Description    : Prefer using secrets as files over secrets as environment variables
 File           : apache-deployment.yaml
 Line           : 28
 Severity       : HIGH
 ------------------------------------------------------------------

Disaster recovery and backups

For business-critical applications it is essential to have a proper disaster recovery plan. Failure is always a possibility due to a system failure, human error, or malicious activity. As part of a set of best practices, you should always have a backup of your critical data and components that you can quickly restore in case of any type of failure. 

Backup of etcd

Etcd is a distributed key-value store that holds all K8s objects and cluster states. For production environments, it is important to deploy etcd as a multi-node cluster with a minimum of three nodes (five recommended). You should also take regular backups of the etcd state to recover from any type of loss or corruption of data. 

Etcd supports built-in snapshots as follows. The snapshots should be stored somewhere offsite that is separate from the main production environment:

ubuntu@node1:~$ sudo etcdctl --endpoints https://node1:2379 \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
--cert=/etc/ssl/etcd/ssl/member-node1.pem \
--key=/etc/ssl/etcd/ssl/member-node1-key.pem \
snapshot save etcd-backup.db
.
.
Snapshot saved at etcd-backup.db

Backup of persistent volumes

The etcd snapshots will only save the configuration state of the k8s cluster, not the application data. Applications like databases use persistent volumes to store data. You can take snapshots of the storage volumes if the infrastructure supports backups (like AWS EBS snapshots or Azure Disk snapshots). 

Trilio for Kubernetes

The two backup options discussed above have some limitations. The etcd snapshot only saves the configuration state of the K8s cluster. The volume snapshots are dependent on the storage infrastructure and require a manual recovery process to bring the k8s cluster and applications back up.

The third option is to use Trilio for Kubernetes (T4K). With T4K, you can fully orchestrate your backup and recovery process even in case of failure of a complete K8s cluster. With T4K, you can protect metadata, operators, container image registry, virtual machines, and persistent volumes. You can perform partial disaster recovery (like metadata only) or complete disaster recovery on any K8s cluster whether on-prem or on-cloud.

Learn about a lead telecom firm solved K8s backup and recovery with Trilio

Last thoughts

Working with a production environment has its own set of challenges. You have to deal with infrastructure failure, malicious activities, human errors, and cost optimization and efficiency concerns, while serving unpredictable work-loads. You must design and operate your infrastructure so that you are able to preempt as many avenues of failures as possible. Working with the best practices of K8s operations, you can mitigate failures and provide the best possible response in case of a disaster situation.