A Guide to OpenStack and Ceph Monitoring for Robust Deployments

OpenStack is a robust platform that has redefined the implementation of cloud-native native applications over the last few years. It is an open-source cloud operating system that provides comprehensive services for building and operating public and private clouds.

Ceph, a highly scalable and distributed object storage system, is a critical component of many OpenStack deployments. With its distributed architecture and advanced features like replication and erasure coding, Ceph provides a highly reliable and solidly performing storage backend for OpenStack.

This article explores the key aspects of monitoring Ceph clusters within the OpenStack environment, exploring important metrics, tools, and best practices for ensuring optimal performance and reliability.

Summary of key OpenStack and Ceph concepts

The following table provides an overview of the key concepts covered in this article.

Concept	Description
Understanding Ceph architecture	Overview of Ceph’s unified, distributed design, core components (OSDs, MONs, etc.), and data handling.
Ceph in OpenStack	How Ceph serves as a reliable and scalable storage backend for various OpenStack services.
Monitoring Ceph cluster	Methods for tracking the health of Ceph nodes, OSDs, and PGs to ensure operational stability.

Understanding Ceph architecture

Ceph is an open-source software-defined storage solution that consolidates block, object, and file data into a single, distributed storage cluster. Its unified approach, multi-platform support, exceptional scalability, inherent fault tolerance, and ability to utilize commodity hardware make Ceph an extremely compelling and flexible storage option. A Ceph cluster can be configured to achieve massive capacity and performance by scaling across thousands of nodes.

Each node within a Ceph cluster utilizes readily available commodity hardware and employs intelligent Ceph daemons for seamless communication and coordinated operations. These daemons collectively perform critical tasks such as the following:

Data management: Writing, reading, and compressing data
Data protection: Ensuring data durability through replication or erasure coding
Cluster health monitoring: Continuously monitoring and reporting on cluster health
Data rebalancing: Dynamically redistributing data across the cluster (“backfilling”)
Data integrity: Maintaining data integrity
Failure recovery: Gracefully recovering from node or component failures

Ceph’s core functionality is built upon the Reliable Autonomic Distributed Object Store (RADOS), a low-level data store that is a common foundation for various user-facing services. RADOS provides a straightforward approach for storage abstraction:

Objects: The fundamental storage unit is an object uniquely identified by a 20-byte name and accompanied by optional metadata (attributes). Each object holds a variable-sized data payload, similar to a file.
Object pools: Objects are organized into named pools, each representing a distinct namespace within the storage system. Pool parameters, such as replication level and data distribution rules, define how objects within that pool are stored and protected.
Storage cluster: The storage cluster comprises a collection of object storage daemons (OSDs), which collectively store and manage the data across the cluster.

Data within a Ceph storage cluster is stored as objects within the underlying RADOS data store. Each object is assigned to and stored on a designated OSD. These OSDs manage all data operations on the attached storage drives, including read, write, and replication operations.

Storing data in a Ceph cluster

Automated OpenStack & OpenShift Data Protection & Intelligent Recovery

Enjoy native OpenStack integration, documented RESTful API, and native OpenStack CLI

Restore Virtual Machines with Trilio’s One-Click Restore

Select components of your build to recover via the OpenStack dashboard or CLI

Major backend components of a cluster

The major components of a Ceph cluster are shown below. The data in a Ceph cluster can be accessed in any of the following ways:

Ceph Native API (librados)
Ceph Block Device (RBD, librbd), also known as a RADOS Block Device (RBD) image
Ceph Object Gateway
Ceph File System (CephFS, libcephfs)

Major backend components of Ceph

The Ceph cluster consists of the following major daemons:

Ceph Monitors (MONs) are the daemons that maintain the cluster map. The cluster map is a collection of five maps containing information about the cluster’s state and configuration. For each cluster event, Ceph must update the appropriate map and replicate the updated map to the MON daemon on each node. To apply updates, the MONs must establish a consensus on the state of the cluster. Most configured monitors must be available and agree on the map update.
Object Storage Devices (OSDs) are the building blocks of a Ceph storage cluster. OSDs connect storage devices, such as hard disks, to the Ceph storage cluster. An individual storage server can run multiple OSD daemons and provide multiple OSDs to the cluster. Ceph supports a feature called BlueStore that stores data within RADOS. BlueStore uses local storage devices in raw mode and is designed for high performance.
Managers (MGRs) provide for the collection of cluster statistics. If no MGRs are available in a cluster, client I/O operations are not negatively affected but attempts to query cluster statistics fail. The MGR daemon centralizes access to all data collected from the cluster and provides storage administrators with a simple web dashboard.
Metadata Server (MDS) manages Ceph File System (CephFS) metadata. It provides POSIX-compliant, shared file-system metadata management, including ownership, time stamps, and mode. The MDS stores its metadata in RADOS instead of local storage. It has no access to file contents.

Learn how Trilio’s partnership with Canonical helps better protect your data

Distributed data placement and locality

Since Ceph acts as a distributed storage system, it is essential from a performance point of view that computing power be kept as close as possible to the physical data. The Controlled Replication Under Scalable Hashing (CRUSH) algorithm makes this possible. Instead of depending on a central lookup table, both Ceph clients and OSD daemons use the CRUSH algorithm to efficiently compute information about object location. The CRUSH algorithm employs a crucial abstraction layer by assigning each object to a unique placement group (PG)—essentially, a logical grouping of data. This abstraction decouples the application layer, where objects reside, from the physical layer, where OSDs store the data.

CRUSH utilizes a pseudo-random algorithm to distribute objects evenly across these PGs. Furthermore, it leverages defined rules to map these PGs to specific OSDs within the cluster. In the event of an OSD failure, Ceph intelligently remaps the affected PGs to other available OSDs, ensuring data integrity and maintaining data protection rules through seamless data synchronization.

One OSD is the primary OSD for the object’s placement group, and Ceph clients always contact the primary OSD when they read or write data. Other OSDs are secondary OSDs and play an important role in ensuring the data’s resilience in the event of cluster failures.

The Ceph cluster map

Ceph clients and OSDs require knowledge of the cluster topology. Five maps represent the cluster topology, which is collectively referred to as the cluster map. The Ceph Monitor daemon is responsible for maintaining the cluster map. A cluster of Ceph MONs ensures high availability if a monitor daemon fails.

The five maps are as follows:

Monitor Map: This map contains the cluster’s fsid; each monitor’s position, name, address, and port; and map time stamps. The fsid is a unique, auto-generated identifier (UUID) that identifies the Ceph cluster.
OSD Map: This map contains the cluster’s fsid, a list of pools, replica sizes, placement group numbers, a list of OSDs and their status, and map time stamps.
Placement Group (PG) Map: This map contains the PG version, the ratio of up to acting OSDs, details on each placement group such as the PG ID, the up set, the acting set, the state of the PG, data usage statistics for each pool, and map time stamps.
CRUSH Map: This map lists storage devices, the failure domain hierarchy (device, host, rack, row etc.), and rules for traversing the hierarchy when storing data.
Metadata Server (MDS) Map: This map contains the pool for storing metadata, a list of metadata servers, metadata servers status, and map time stamps.

Learn about the features that power Trilio’s intelligent backup and restore

Ceph in OpenStack

OpenStack is an open-source infrastructure-as-a-service (IaaS) platform that can provide public and private clouds in your data center or on the edge. It is implemented as a collection of interacting services that control compute, storage, and networking resources.

Ceph is the most common back end used with OpenStack as it provides reliable backend storage for OpenStack services. Ceph can integrate with OpenStack services such as Compute, Block Storage, Shared File Systems, Image, and Object Store to provide easier storage management and cloud scalability.

Integrating Ceph with OpenStack

Ceph can provide the following features when integrated with OpenStack:

Support for the same API that the Swift Object Store uses
Support for thin provisioning by using copy-on-write, making volume-based provisioning fast
Support for Keystone identity authentication, for transparent integration with or replacement of the Swift Object Store
Consolidation of object storage and block storage
Support for the CephFS distributed file system interface

There are two approaches to integrating Ceph into an OpenStack infrastructure: dedicated and external. Both of the approaches are implemented via TripleO:

Dedicated: An organization without an existing, stand-alone Ceph cluster installs a dedicated Ceph cluster that is composed of Ceph services and storage nodes during an overcloud installation. Only services and workloads that are deployed for or on an OpenStack overcloud can use an OpenStack-dedicated Ceph implementation. External applications cannot access or use OpenStack-dedicated Ceph cluster storage.
External: An organization can use an existing, standalone Ceph cluster for storage when creating a new OpenStack overcloud. During overcloud installation, the TripleO deployment is configured to access that external cluster to make the necessary pools, accounts, and other resources. Instead of creating internal Ceph services, the deployment configures the OpenStack overcloud to access the existing Ceph cluster as a Ceph client.

Each OpenStack service is an API abstraction that hides the backend implementation. This abstraction allows for flexibility, enabling many services to utilize multiple backends concurrently. In an OpenStack environment, services can be configured to leverage multiple pools, employing tiering or tagging mechanisms to dynamically select the most suitable pool for each workload based on performance requirements. However, implementing tiering in an existing cluster can trigger significant data movement due to necessary CRUSH rule adjustments.

The sections that follow will explore how OpenStack services can effectively utilize Ceph as their backend storage.

How OpenStack services use Ceph

Image storage

By default, OpenStack’s Image service utilizes a local file store colocated with the Glance API node on the controller. When the Compute service requires images in the RAW format, the Glance service converts QCOW2 images to RAW and caches the converted versions.

When Ceph is integrated with OpenStack, TripleO defaults to using Ceph RBDs for image storage. Glance images are stored within a dedicated Ceph pool named “images.” Recognizing the immutability of images, OpenStack treats them as unchanging blobs. To ensure data resilience, the “images” pool is configured with replication by default, ensuring that image replicas are distributed across multiple storage devices for fault tolerance.

Object storage

OpenStack leverages the Object Store service (Swift) for object storage, providing compatibility with both Swift and Amazon S3 APIs. While a file-based backend (XFS-formatted partition) is the default, the Object Store service can be configured to utilize an external Swift cluster. When Ceph is integrated into OpenStack, TripleO seamlessly integrates the Object Store service with the Ceph RADOS Gateway, making it the default backend. Recognizing the absence of Swift in this scenario, TripleO similarly configures the Image service to utilize RGW.

Keystone

Ceph RGW can be integrated with the Keystone identity service to enhance security and user management. This integration designates Keystone as the authoritative source for user identities. User access granted by Keystone is automatically reflected in RGW, and Keystone-validated tokens are accepted for authentication. Furthermore, Keystone is configured to recognize RGW as a valid object storage endpoint.

Block storage

OpenStack leverages the Block Storage service (Cinder) to provide persistent volumes that remain intact even when detached from instances. Cinder supports multiple backends, with LVM using the “cinder-volumes” volume group as the default. Upon Ceph integration, TripleO seamlessly configures Cinder to utilize Ceph RBDs as the primary backend. Block Storage volumes are stored within a dedicated Ceph pool named “volumes,” while backups are stored in a separate “backups” pool.

Ceph block device images are attached to OpenStack instances through libvirt, which configures the QEMU interface to interact with the librbd Ceph module. By distributing block volumes across multiple OSDs within the cluster, Ceph significantly enhances performance for large volumes compared to traditional local drives.

File storage

OpenStack leverages the Shared File Systems service (Manila) to provide access to shared file systems. Manila supports a variety of backends and can provision shares from multiple sources. Share servers export shares to clients using file system protocols, including NFS, CIFS, GlusterFS, and HDFS.

When Ceph is integrated into OpenStack, TripleO configures Manila to utilize CephFS as the default backend. CephFS leverages the NFS protocol for seamless Shared File Systems service integration.

Compute storage

OpenStack’s Compute service (Nova) manages the creation and lifecycle of virtual machines (VMs). By default, Nova utilizes the KVM hypervisor with libvirt for VM execution. When integrated with RHOSP, Nova can utilize Ceph RBDs for storage. This allows for flexible storage management, enabling both ephemeral and persistent storage options, such as operating system disks. The integration enhances performance and scalability by distributing data across multiple OSDs within the Ceph cluster, particularly for large volumes.

Backing up OpenStack with Ceph

Regular backups are indispensable to safeguarding an OpenStack environment and ensuring business continuity. Enterprise backup solutions, such as Trilio, can streamline the backup process, making it easier to protect data. Trilio provides a self-service backup solution, enabling OpenStack users to manage their backups.

Cinder and Ceph form a powerful combination within the OpenStack ecosystem, delivering a scalable and reliable block storage solution. Their seamless integration empowers users to easily manage their storage resources within OpenStack and scale their storage capacity as needed. Trilio’s specialized tools seamlessly integrate with this architecture, enabling efficient and reliable backups and data recovery stored within Ceph-based volumes, enhancing data protection, and ensuring business continuity for OpenStack deployments.

Monitoring Ceph clusters

Monitoring is critical for ensuring the optimal performance, reliability, and availability of Ceph clusters within an OpenStack environment. The key components to monitor in a cluster include OSDs, monitoring daemons, placement group status, and metadata server status.

The MGR daemons collect the cluster statistics. In case of MGR daemon unavailability, the client I/O operations continue to be served normally, but queries for cluster statistics fail. The OpenStack Platform uses ceph-mon as a monitor daemon for the Ceph cluster. The director deploys this daemon on all controller nodes.

Checking monitoring configuration

The monitoring service configuration can be checked by logging in to an OpenStack Controller node and viewing the /etc/ceph/ceph.conf file.

    
     $ cat /etc/ceph/ceph.conf
[global]
osd_pool_default_pgp_num = 128
osd_pool_default_min_size = 1
auth_service_required = cephx
mon_initial_members = overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
fsid = 8c835acc-6838-13e1-bb91-2cc279334a19
cluster_network = 192.168.180.0/24
auth_supported = cephx
auth_cluster_required = cephx
mon_host = 192.168.150.13,192.168.150.14,192.168.150.15
auth_client_required = cephx
osd_pool_default_size = 3
osd_pool_default_pg_num = 128
public_network = 192.168.148.0/22

This particular configuration includes three monitors located at 192.168.150.13, 192.168.150.14, and 192.168.150.15.

Checking the status of Ceph nodes

To check the status of a specific node in the Ceph Storage cluster, log in to the node and run the following command.

    
     $ ceph -s
    cluster 8c835acc-6838-13e1-bb91-2cc279334a19
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=192.168.150.13:6789/0,overcloud-controller-1=192.168.150.14:6789/0,overcloud-controller-2=192.168.150.15:6789/0}
            election epoch 152, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e543: 6 osds: 6 up, 6 in
      pgmap v1736: 256 pgs, 4 pools, 0 bytes data, 0 objects
            17 GB used, 183 GB / 200 GB avail
                 256 active+clean

The output above provides a snapshot of the current health and status of the Ceph cluster. It indicates that the cluster is currently healthy, with all OSDs and monitors operational.

The health status can be one of the following values:

HEALTH_OK indicates that the cluster is operating normally.
HEALTH_WARN indicates that the cluster is in a warning condition, for example, if an OSD is down, but there are enough OSDs working properly for the cluster to function.
HEALTH_ERR indicates that the cluster is in an error condition. For example, a full OSD could have an impact on the functionality of the cluster.

Monitoring ongoing cluster events

It is also possible to display additional real-time monitoring information about the ongoing events happening in the Ceph cluster.

Consider this command:

    
     $ ceph -w
cluster b370a29d-9287-4ca3-ab57-3d824f65e339
 health HEALTH_OK
 monmap e1: 3 mons at {overcloud-controller-0=192.168.150.13:6789/0,overcloud-controller-1=192.168.150.14:6789/0,overcloud-controller-2=192.168.150.15:6789/0}
            election epoch 152, quorum 0,1,2
   osdmap e543: 6 osds: 6 up, 6 in
      pgmap v1736: 256 pgs, 4 pools, 0 bytes data, 0 objects
            17 GB used, 183 GB / 200 GB avail
                 256 active+clean

2024-09-08 15:45:21.655871 osd.0 [INF] 17.71 deep-scrub ok
2024-09-08 15:45:47.880608 osd.1 [INF] 1.0 scrub ok
2024-09-08 15:45:48.865375 osd.1 [INF] 1.3 scrub ok
2024-09-08 15:45:50.866479 osd.1 [INF] 1.4 scrub ok
2024-09-08 15:45:01.345821 mon.0 [INF] pgmap v41339: 952 pgs: 952 active+clean; 17130 MB data, 17 GB used, 183 GB / 200 GB avail
2024-09-08 15:45:05.718640 mon.0 [INF] pgmap v41340: 952 pgs: 1 active+clean+scrubbing+deep, 951 active+clean; 17130 MB data, 17 GB used, 183 GB / 200 GB avail
[..........................]

The command provides information about ongoing cluster activities, such as:

Data rebalancing across the cluster
Replica recovery across the cluster
Scrubbing activity
OSDs starting and stopping

Monitoring OSD utilization

The usage statistics for OSDs can be displayed as follows:

    
     $ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
VAR PGS STATUS
0 hdd 0.00980 1.00000 10 GiB 1.0 GiB 28 MiB 20 KiB 1024 MiB 9.0 GiB 10.28
1.00 41 up
1 hdd 0.00980 1.00000 10 GiB 1.0 GiB 29 MiB 40 KiB 1024 MiB 9.0 GiB 10.29
1.00 58 up
2 hdd 0.00980 1.00000 10 GiB 1.0 GiB 28 MiB 20 KiB 1024 MiB 9.0 GiB 10.28
1.00 30 up
[......................]

An OSD daemon can be in one of four states, based on the combination of these two flags:

down or up, indicating whether the daemon is running and communicating with the MONs.
out or in, indicating whether the OSD is participating in cluster data placement.

Monitoring OSD performance

To observe performance metrics for OSDs, use the following command.

    
     $ ceph osd perf
osd fs_commit_latency(ms) fs_apply_latency(ms)
  0                     8                    9
  1                     5                    4
  2                     6                    7
  3                    12                   13
  4                     9                    8
  5                     3                    5
  6                     7                    9
  7                     1                    2
  8                     2                    4

The major fields to look for are as follows. The values for both should be on the lower side:

fs_commit_latency (ms): This metric measures the latency experienced when committing data to the on-disk journal. Lower values indicate better performance, as it means write operations are being journaled quickly.
fs_apply_latency (ms): This metric measures the latency experienced when applying the committed data to the actual data file. Lower values are desirable, signifying faster completion of write operations to the underlying storage.

Monitoring placement groups

Every placement group in a Ceph cluster has a status string assigned to it that indicates its health state. The cluster is healthy when all placement groups are in the “active+clean” state.

    
     $ ceph pg dump
pg 0.0 (0.0): active+clean
pg 0.1 (0.1): active+clean
pg 0.2 (0.2): active+clean
pg 0.3 (0.3): active+clean
[..............]

A PG status of “scrubbing” or “deep scrubbing” can also occur in a healthy cluster and does not indicate a problem. Placement group scrubbing is a background process that verifies data consistency by comparing an object’s size and other metadata with its replicas on other OSDs and reporting inconsistencies.

The placement groups transition into degraded or peering states after a failure. If a placement group remains in one of these states for a long time, then the MON marks the placement group as stuck. A stuck PG might be in one or more of the following states:

An inactive PG might be having a peering problem.
An unclean PG might be having problems recovering after a failure.
A stale PG has no OSD reporting, which might indicate that all OSDs are down and out.
An undersized PG has insufficient OSDs to store the configured number of replicas.

Find out how Vericast solved K8s backup and recovery with Trilio

Conclusion

For organizations leveraging OpenStack with Ceph, ensuring robust data protection and disaster recovery is paramount. Trilio provides purpose-built, OpenStack-native data protection solutions that seamlessly integrate with your Ceph storage, simplifying backup, recovery, and migration of your OpenStack workloads.

Trilio’s approach to backing up Ceph-based Cinder volumes is highly optimized for storage, network, and resource performance. Instead of relying on traditional full-volume snapshots or temporary mounts, Trilio directly leverages Ceph’s native capabilities to significantly reduce storage consumption and bandwidth usage. It achieves this by:

Avoiding Temporary Volumes or Mounts: Trilio doesn’t create temporary volumes or mount Ceph snapshots to compute instances or containers during the backup process, eliminating resource overhead.
Leveraging Ceph Snapshots and rbd diff: For incremental backups, Trilio uses the rbd diff command between new and previous Ceph snapshots. This efficiently identifies and backs up only the changed data blocks, drastically reducing data transfer and storage requirements.
Optimized Resource Efficiency: The DataMover service runs as containers directly on the physical Compute hosts, bypassing the need for separate backup VMs. These containers operate at a lower privilege, ensuring no interference with your virtual workloads.

More about how Trilio enhances data protection for OpenStack and Ceph environments can be found here: https://trilio.io/resources/openstack-ceph/

Table Of Contents

Like This Article?

Subscribe to our LinkedIn Newsletter to receive more educational content

Products

Why Trilio

OpenStack Backup and Recovery

Kubernetes Backup and Recovery

Red Hat Virtualization

OVirt Backup and Recovery

Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

OpenStack and Ceph: Best Practices for Monitoring Performance and Health

Summary of key OpenStack and Ceph concepts

Understanding Ceph architecture

Automated OpenStack & OpenShift Data Protection & Intelligent Recovery

Major backend components of a cluster

Learn how Trilio’s partnership with Canonical helps better protect your data

Distributed data placement and locality

The Ceph cluster map

Learn about the features that power Trilio’s intelligent backup and restore

Ceph in OpenStack

Integrating Ceph with OpenStack

Image storage

Object storage

Keystone

Block storage

File storage

Compute storage

Backing up OpenStack with Ceph

Monitoring Ceph clusters

Checking monitoring configuration

Checking the status of Ceph nodes

Monitoring ongoing cluster events

Monitoring OSD utilization

Monitoring OSD performance

Monitoring placement groups

Find out how Vericast solved K8s backup and recovery with Trilio

Conclusion

Like This Article?

Products

Solutions

Legal

Let’s Connect!