Persistent storage has long been challenging for stateful applications running in containerized environments like OpenShift. While containers offer flexibility and scalability, their ephemeral nature can lead to data loss if they are not properly managed.
Unlike traditional systems, container storage is temporary, meaning that data created within a container can be lost. Many applications, especially those that need to maintain data across restarts (like databases) require permanent storage solutions. However, the storage landscape for containerized workloads has evolved significantly, and OpenShift now offers a variety of solutions to address this critical requirement.
In this article, we discuss the key concepts of OpenShift block storage and learn the steps of setting up OpenShift Data Foundation (ODF) services in an OpenShift cluster.
Summary of key concepts of block storage for OpenShift
The following table provides an overview of the key concepts covered in this article.
Concept | Description |
Block storage in OpenShift | Provides durable storage volumes that preserve application data, even if containers are restarted or relocated. |
Persistent storage in OpenShift | OpenShift provides persistent storage capabilities to handle data that needs to survive even if an application pod is restarted or replaced. |
Dynamic volume provisioning | When more storage is needed, OpenShift leverages Container Storage Interface (CSI) drivers to automatically provision storage from various providers for simplified storage management and scalability. |
OpenShift Data Foundation | A software-defined storage solution specifically designed for OpenShift, offering advanced features like high availability and scalability for containerized environments. |
Block storage in OpenShift
Block storage provides access to raw block devices for application storage. These block devices function as independent storage volumes, similar to the physical drives found in servers, and typically require formatting and mounting for application access. Each block device is treated as an independent disk drive and can support an individual file system.
Block storage is ideal when applications require faster access to optimize computationally heavy data workloads. Block-level access to storage volumes is a common approach for databases, server-side processing, and high-performance data access applications. Use block storage if the containerized workload requires fast and reliable data access.
OpenShift can use locally attached drives or storage provisioned from SAN arrays for block storage, which doesn’t directly interact with physical storage. It relies on an abstraction layer to manage and provision storage, allowing the application to decouple from the underlying storage infrastructure.
OpenShift uses block storage for both persistent and temporary storage needs. For persistent storage, OpenShift can utilize underlying storage solutions to provision volumes that persist beyond the lifecycle of pods. These volumes are ideal for stateful applications that require data durability.
For applications that do not require data to persist, OpenShift provides temporary block storage directly from the local storage of the nodes hosting the pods. This ephemeral storage is suitable for temporary data that does not need to be preserved after the pod terminates.
Understanding ephemeral and persistent block storage in OpenShift
Storage in the OpenShift platform can be broadly classified into two categories: ephemeral and persistent. Ephemeral storage is transient in nature and designed for stateless applications. Stateful applications require persistent storage to persist their data independently of the pod’s lifecycle.
Understanding ephemeral storage
Some applications require storage but don’t need the data to persist after they stop; ephemeral volumes are well suited for these scenarios. They are created and deleted with the pod, so pods can be restarted anywhere without relying on specific persistent storage. Containerized workloads use this storage for temporary files, caching, and logs. However, this approach can lead to several issues:
- Unknown storage capacity: Pods can’t determine how much temporary storage is available.
- No guaranteed storage: Pods can’t request a specific amount of temporary storage; it is allocated on a first-come, first-served basis.
- Eviction risk: Pods might be removed if they use too much temporary storage, and new pods can’t start until space is freed up.
- Lack of suitability for stateful applications: Stateful workloads like databases that require data persistence across restarts cannot use ephemeral storage.
Understanding persistent storage
OpenShift makes storage management and consumption easy for cluster administrators and stateful containerized workloads. It uses the persistent volume framework to allocate storage resources. The PersistentVolume API simplifies persistent storage management by hiding complex details, allowing users and administrators to interact with storage without worrying about the underlying implementation.
There are two relevant resources here:
- Persistent volume claims (PVCs): The workloads in OpenShift clusters express their storage requirements through PVCs. A PVC specifies the storage size and access mode needed and optionally requests a specific storage class.
- Persistent volumes (PVs): Persistent volumes represent the physical storage resources available in the cluster. PVs can utilize block storage protocols (such as Fiber Channel and iSCSI), file storage protocols (such as NFS), or specific storage systems offered by storage array vendors and cloud providers.
Administrators set up storage resources by creating PVs, while developers request those resources for their workloads via PVCs. This enables developers to focus on their applications, not storage details. OpenShift’s approach ensures efficient storage utilization and flexible allocation within the cluster.
Automated Red Hat OpenShift Data Protection & Intelligent Recovery
Perform secure application-centric backups of containers, VMs, helm & operators
Use pre-staged snapshots to instantly test, transform, and restore during recovery
Scale with fully automated policy-driven backup-and-restore workflows
Dynamic volume provisioning with storage classes and CSI drivers
The manual provisioning of persistent volumes and persistent volume claims can be a tedious and error-prone process. It requires precise matching of storage resources to application needs, which can be challenging to predict in advance. Dynamic provisioning can mitigate some of these issues by automating the provisioning of persistent volumes.
Storage classes dynamically provision storage resources based on applications’ needs. They allow administrators to describe the classes of storage they offer by specifying factors like provisioners, access modes, and quality of service. This allows for more granular control over the allocation of persistent volumes and ensures that workloads receive the appropriate storage resources for their specific needs. When creating a PVC, workloads can specify a desired storage class based on requirements.
Storage classes provide a versatile way to define and work with various types of storage in your environment. However, they can be limited when working with enterprise storage arrays. The Container Storage Interface (CSI) helps overcome these challenges by establishing a standard interface that allows different storage systems to connect with OpenShift. The CSI interface decouples the storage systems from OpenShift, which makes it easier to integrate storage from a wide range of storage providers, such as traditional storage arrays, cloud platforms, and object storage. This design allows cluster administrators to select the most suitable storage options for their containerized workloads without facing any limitations imposed by default storage plugins.
OpenShift can use the Container Storage Interface to consume storage from storage backends that implement the CSI interface as persistent storage. CSI acts as a plugin between OpenShift and the underlying storage provider. It translates storage requests (PVs and PVCs) into specific calls for the storage array that the driver manages.
OpenShift virtualization can use any supported CSI provisioner. Each storage class uses a defined provisioner to create persistent volumes. The provisioner determines the volume plugin for provisioning these volumes and converts PVC requests into CSI calls for creating PVs.
The following figure illustrates this process in detail.
Visualizing the PV, PVC, storage class, and CSI driver workflow in OpenShift
More details about persistent storage implementation in OpenShift can be referred to in its official documentation.
Simplifying block storage with OpenShift Data Foundation (ODF)
OpenShift Data Foundation is a storage solution from RedHat that simplifies persistent storage management for containerized workloads deployed in OpenShift. It offers a unified approach to file, block, and object storage for both on-premises and hybrid cloud environments. Unlike conventional storage systems that require separate drivers and operators for different storage types, ODF provides a consolidated platform that meets all persistent storage needs for the cluster.
OpenShift Data Foundation architecture
Under the hood, ODF is based on open-source technologies such as Ceph, NooBaa, and Rook:
- Ceph: Ceph is a unified, distributed, and scalable software storage solution that can provide object, block, and file storage for commodity hardware.
- Rook: Rook is an open-source orchestration tool for cloud-native Kubernetes storage. It provides the necessary framework for integrating Ceph storage within Kubernetes and OpenShift.
- Noobaa: ODF uses the multi-cloud object gateway (MCG) service based on the NooBaa project to provide a local object service (S3 API) backed by local or cloud-native storage.
OpenShift Data Foundation (ODF) uses Ceph to provide highly available and scalable block storage. Ceph uses the underlying physical storage devices to create a virtualized pool that guarantees high availability via data replication. ODF abstracts underlying storage details that can allow file, block, or object storage claims to get provisioned out of the same raw block storage. With ODF, data durability and fault tolerance are ensured by taking advantage of the self-healing and replicating nature of Ceph underneath.
ODF provides the following types of storage:
- Block storage: ODF uses Ceph’s RADOS Block Device (RBD) to create block storage volumes that can be used for high-performance and demanding workloads.
- File storage: ODF utilizes CephFS, a distributed file system built on top of Ceph, to provide scalable and shared file storage as an alternative to NFS.
- Object storage: ODF leverages Ceph’s RADOS Gateway (RGW) and NooBaa to provide object storage. This can be used to store and retrieve large amounts of unstructured data, such as media files and backups.
Simplified architecture of OpenShift Data Foundation (source)
The Rook operator creates and updates the CSI driver, including a provisioner for each of the two drivers—RADOS block device (RBD) and Ceph filesystem (CephFS)—and volume plugin daemons for each of the two drivers.
Learn KubeVirt & OpenShift Virtualization Backup & Recovery Best Practices
Deployment options
As shown in the figure above, ODF provides deployment flexibility so that end-users can adopt the most appropriate approach for their environment. OpenShift Data Foundation can be deployed in the following two modes:
- Internal Mode: ODF is deployed entirely within the OpenShift cluster in internal mode. This approach can use local storage devices, SAN volumes, EBS volumes, or vSphere volumes in combination with the LSO operator. This approach is practical when:
- Cluster storage requirements are not precise.
- There are no dedicated infrastructure nodes.
- Creating an extra node instance, such as on bare metal servers, is difficult.
- Internal Mode: ODF is deployed entirely within the OpenShift cluster in internal mode. This approach can use local storage devices, SAN volumes, EBS volumes, or vSphere volumes in combination with the LSO operator. This approach is practical when:
- External Mode: In external deployments, ODF uses an independent Ceph Storage cluster running outside the OpenShift cluster. This approach is recommended when:
- The cluster’s storage requirements are significant.
- Multiple OpenShift clusters are consuming storage services from a standard external cluster.
- Another dedicated team is managing the external Ceph cluster.
Setting up OpenShift Data Foundation
We’re now going to see how to set up ODF services in an OpenShift cluster. For this demo, we’re going to configure ODF services in internal mode.
The commands used in the following tutorial can also be found in our Git repo.
Prerequisites
The minimum requirements for setting up ODF services on OpenShift are as follows:
- You need an OpenShift cluster with at least three worker or infrastructure nodes.
- Each of the selected nodes must have at least one raw block device available for use by ODF.
- The block devices must be empty and must not contain any LVM-related configurations.
- If the selected OpenShift nodes are VMs on VMware, ensure that the disk.EnableUUID option is set to TRUE for each VM.
- For internal mode, the cluster should have a minimum of:
- 72 GB RAM
- 24 CPU cores
- 3 physical disks
Operator installation
To install the ODF operator on your OpenShift cluster, log into the web console with an account having cluster-admin privileges and install two operators as follows:
- Click Operators > OperatorHub, then type OpenShift Data Foundation in the Filter by keyword field. Click OpenShift Data Foundation from the operator results list, then click Install. Accept all the default settings from the Operator Installation page, then click Install.
- Click Operators > OperatorHub, then type Local Storage in the Filter by keyword field. Click Local Storage from the operator results list, then click Install. Accept all the default settings from the Operator Installation page, then click Install.
Preparing local storage for ODF
Before using local storage disks for ODF deployment, the following tasks need to be performed using the local storage operator:
- Discovery of available disks that will be used for the ODF cluster
- Creation of storage class and persistent volumes
Go to Installed Operators > Local Storage > Local Volume Discoveries and create a local volume discovery operation for the desired nodes.
Learn How To Best Backup & Restore Virtual Machines Running on OpenShift
Configuring the local storage operator
Discovering local volumes on cluster nodes
The discovery operation will run on each selected node. To see more details about its results, use the oc describe command and see the list of disks and their availability status. The disks used by the cluster will have a status of NotAvailable. Check the disk to add to the ODF cluster and note its Device ID.
You can also consider using the /dev/sdX identifier for your disks, but that can result in configuration issues: The naming convention /dev/sdX is not a stable identifier for disk drives because it is assigned in the order in which drives are discovered. The discovery order can change across reboots, so using the unique identifier for drives present under the Device ID details is best, as they persist across reboots.
$ oc get localvolumediscoveryresults NAME AGE discovery-result-ocpnode-1 1h discovery-result-ocpnode-2 1h discovery-result-ocpnode-3 1h $ oc describe localvolumediscoveryresults discovery-result-ocpnode-1 |less [...............] Device ID: /dev/disk/by-id/wwn-0x6000c29c01d91ed7b7109f82c40d42e7 Fstype: Model: Virtual disk Path: /dev/sdc Property: Rotational Serial: 6000c29c01d91ed7b7109f82c40d42e7 Size: 107374182400 Status: State: Available [.................]
Repeat this process for other discovery results and verify the details of the available disks.
Once device details have been identified, return to the web console and go to Installed Operators > Local Storage > Local Volume and click Create Local Volume.
Creating local volumes on cluster nodes
Choose a name for the local volumes and click the drop-down menu under StorageClassDevices > devicePaths. In the Value field, paste your drives’ Device ID details. Click Add devicePaths to add details about additional drives.
Adding device details for local volume creation
Set a name for your storage class under storageClassName and set the volumeMode to Block. Leave the other options unchanged and create the local volume.
Configuring parameters for local volume creation
Once the local volume has been created successfully, you can verify that the new storage class has been created and a new persistent volume exists against each physical device.
$ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE lso kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 1h $ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-3cfceaf 20Gi RWO Delete Bound openshift-storage/ocs-deviceset-lso-0-data-4fgmfr lso 1h local-pv-9ccedf9f 100Gi RWO Delete Bound openshift-storage/ocs-deviceset-lso-0-data-5zz5lk lso 1h local-pv-a862d6ec 100Gi RWO Delete Bound openshift-storage/ocs-deviceset-lso-0-data-15bsz7 lso 1h local-pv-b98bce49 100Gi RWO Delete Bound openshift-storage/ocs-deviceset-lso-0-data-0kmr2r lso 86d [.........................]
Once local persistent volumes have been successfully created, a storage cluster using the ODF operator can be made. Go to Installed Operators > OpenShift Data Foundation > Storage System and click Create Storage System.
Creating an ODF storage system
Select the Full deployment and Use an existing StorageClass options, and choose the recently created storage class.
Specifying the storage system deployment type
In the Capacity and nodes page, provide the necessary information and choose a value for Requested Capacity. Select the appropriate cluster nodes with the attached devices.
Selecting cluster nodes for ODF installation
You can leave the default settings as such for Security and network and Data Protection and proceed to create the storage system.
Configuring storage system parameters
Monitor the progress of the storage cluster, pods, and PVCs in the openshift-storage project. It takes a few minutes for all of the resources to be ready.
$ watch oc get storagecluster,pods -n openshift-storage
Once the storage cluster has been created successfully, the command will report Ready status.
$ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 1h Ready 2024-03-11T11:22:03Z 4.14.11
List the available storage classes. You’ll see that ODF has created the following storage classes.
$ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE lso kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 1h ocs-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 1h ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 1h ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 1h openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 1h
The use cases of these storage classes are as follows:
- ocs-storagecluster-ceph-rbd: This class supports block storage devices primarily used for high-performance workloads like databases.
- ocs-storagecluster-cephfs: This class provides shared and distributed file system data services, primarily used for logging and data aggregation workloads.
- openshift-storage.noobaa.io: This class provides the Multicloud Object Gateway (MCG) service, which provides multicloud object storage as an S3 API endpoint. This allows for the abstracting and retrieving of data from multiple cloud object stores.
- ocs-storagecluster-ceph-rgw: This class provides on-premises object storage, primarily targeting data-intensive applications.
Learn KubeVirt & OpenShift Virtualization Backup & Recovery Best Practices
Using ODF block storage
ODF provides the storage class ocs-storagecluster-ceph-rbd for provisioning block volumes.
Let’s see more details about this class.
$$ oc describe sc ocs-storagecluster-ceph-rbd Name: ocs-storagecluster-ceph-rbd IsDefaultClass: No Annotations: description=Provides RWO Filesystem volumes, and RWO and RWX Block volumes,storageclass.kubernetes.io/is-default-class=true Provisioner: openshift-storage.rbd.csi.ceph.com Parameters: clusterID=openshift-storage,csi.storage.k8s.io/controller-expand-secret-name=rook-csi-rbd-provisioner,csi.storage.k8s.io/controller-expand-secret-namespace=openshift-storage,csi.storage.k8s.io/fstype=ext4,csi.storage.k8s.io/node-stage-secret-name=rook-csi-rbd-node,csi.storage.k8s.io/node-stage-secret-namespace=openshift-storage,csi.storage.k8s.io/provisioner-secret-name=rook-csi-rbd-provisioner,csi.storage.k8s.io/provisioner-secret-namespace=openshift-storage,imageFeatures=layering,deep-flatten,exclusive-lock,object-map,fast-diff,imageFormat=2,pool=ocs-storagecluster-cephblockpool AllowVolumeExpansion: True MountOptions: <none> ReclaimPolicy: Delete VolumeBindingMode: Immediate Events: <none>
The output above highlights the following details:
- IsDefaultClass: No — This storage class is the default for persistent volume claims (PVCs) if no other storage class is specified.
- Provisioner: openshift-storage.rbd.csi.ceph.com — This indicates that the storage class uses the Ceph RBD CSI driver to provision volumes.
- Parameters: This section lists various parameters used by the CSI driver to configure the storage:
- clusterID=openshift-storage — Identifies the Ceph cluster to use
- pool=ocs-storagecluster-cephblockpool — Specifies the Ceph pool where the volumes will be created
- csi.storage.k8s.io/fstype=ext4 — Defines the filesystem type to be used on the volumes (ext4 in this case)
- Other parameters that configure secrets used for authentication and node access
- AllowVolumeExpansion=True — Enables expanding the size of persistent volumes provisioned by this storage class
- MountOptions: <none> — No specific mount options defined
- ReclaimPolicy: Delete — Specifies that when a PVC using this storage class is deleted, the corresponding persistent volume will also be deleted
- VolumeBindingMode: Immediate — Indicates that persistent volume claims using this storage class will be bound to a persistent volume as soon as they are created
Let’s create a persistent volume claim using the ocs-storagecluster-ceph-rbd storage class and verify the creation of the corresponding persistent volume.
Create a sample PVC in the default project.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: block-pvc namespace: default spec: accessModes: - ReadWriteOnce storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Filesystem resources: requests: storage: 1Gi
Verify that the PVC is created and immediately bound to the PV.
$ oc create -f pvc.yaml persistentvolumeclaim/block-pvc created $ oc get pvc -n default NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE block-pvc Bound pvc-41a79ec7-7968-4d92-8f03-09f8b69f2d51 1Gi RWO ocs-storagecluster-ceph-rbd 35s $ oc get pv |grep block pvc-41a79ec7-7968-4d92-8f03-09f8b69f2d51 1Gi RWO Delete Bound default/block-pvc ocs-storagecluster-ceph-rbd 35s
Applications can simply reference this PVC in their manifests to consume storage from this PV.
Recommendations for managing block storage in OpenShift
The following are some of the best practices for managing block storage in OpenShift:
- When creating PVCs, request only the necessary storage capacity to avoid wasting space.
- Choose an appropriate storage solution, such as OpenShift Data Foundation, which supports advanced features such as snapshots and clones.
- Regularly review and delete old PVCs that are no longer needed.
- Define appropriate limits and quotas for your storage to control consumption.
- Regularly monitor storage usage and performance to identify potential issues and optimize resource utilization.
Learn How To Best Backup & Restore Virtual Machines Running on OpenShift
Conclusion
Block storage is a significant requirement for running stateful and high-performing workloads. However, implementing persistent block storage for containerized workloads has been challenging, as traditional storage options often need help integrating with orchestration platforms like OpenShift. OpenShift Data Foundation is a storage solution from RedHat that simplifies managing persistent block storage for containerized workloads deployed in OpenShift.
- Summary of key concepts of block storage for OpenShift
- Block storage in OpenShift
- Understanding ephemeral and persistent block storage in OpenShift
- Dynamic volume provisioning with storage classes and CSI drivers
- Simplifying block storage with OpenShift Data Foundation (ODF)
- Setting up OpenShift Data Foundation
- Recommendations for managing block storage in OpenShift
- Conclusion
Like This Article?
Subscribe to our LinkedIn Newsletter to receive more educational content