Executive Chair David Safaii’s Report from Mobile World Congress 2024

Can Your Business Tolerate Data Loss in OpenStack?

Author

Table of Contents

Outages will occur, files do get corrupted, and accidental data loss does happen

As more and more enterprises are moving OpenStack projects from an evaluation phase into production, organizations are challenged by the risk of data loss and have not begun their backup & recovery readiness. An ideal cloud application uses ephemeral storage and computing, and creates workloads on-the-fly from the persistent (object) store, performing computation and saving the results back to the persistent store. In this situation, you are fine without a backup and recovery solution.

But a majority of the enterprise applications were not written for the cloud; thus backup and recovery plays an important consideration for businesses to recover applications from data loss and data corruption scenarios.

So should I have a piecemeal approach to data loss or invest time in finding a comprehensive solution?

Piecemeal Backup Strategy

As Sebastian Han talks about in his blog, depending on which release of OpenStack you are using, OpenStack offers bits and pieces of APIs to implement a backup solution. However, these APIs alone are not sufficient to implement your own backup solution. Each OpenStack deployment is unique as OpenStack itself offers multiple options to implement an OpenStack cloud. Users have a choice of hypervisors, storage subsystems, network vendors, and projects like Ironic and OpenStack distributions, all influence how a backup solution should be implemented.

The first one is the use of block-level (Cinder) backups. In the Kilo release, Cinder has incremental snapshots that provide a significant improvement for Cinder backups, but these backups are disruptive, where the application has to be taken offline to take a snapshot.

On the other hand, Nova offers instance snapshots, but instance snapshots do not snapshot Cinder volumes and Nova snapshots are uploaded to Glance whereas Cinder snapshots are uploaded to Swift. Further, Nova snapshots are not incremental and this limits the efficiency of your backup solution. The other option is to use file-level backups. This solution provides file-level granularity but does require running an agent in each of the VM instances. Further, with respect to an application workload running on multiple virtual machines, file-level backups become overly complex to manage.

Comprehensive Backup strategy

The founders of Trilio proposed the Raksha project that calls for non-disruptive, application-aware, tenant-driven policy-based backups for cloud workloads consisting of one or more virtual machines with integration to the Horizon dashboard.

Here the user has the flexibility to use Nova and/or Cinder volumes and a choice to store incremental and full backups on either Swift, Ceph, or NFS.

The following are additional aspects that you should consider when researching how to prevent data loss:

  • Do I have critical workloads that are using OpenStack Ironic to provision bare metal, and do I need backups for these workloads? The best practice for a number of NoSQL databases is to run these scale-out databases on bare metal instances.
  • In most instances, customers are using OpenStack Heat and/or DevOps tools to configure and manage complex cloud workload deployments. Does the backup and recovery solution need to be tied to these orchestration tools? For application workloads, where additional VM instances are added over time to a workload, resulting in a change in the application topology. It is important for the backup policy to recognize this change so that the next scheduled snapshot could create a consistent backup based on this new change.
  • Would I be using backups to bring up a staging or test environment with not only configuration but also production data, to accelerate application release cycles?
  • Can I restore a bare metal snapshot to VM and vice versa?
  • Is there a place for backups in the disaster recovery strategy, where you can use backups to restore to a remote site?

Let us know your thoughts, as you plan your backup & recovery strategy. If you happen to be at the OpenStack Summit in Vancouver then we can discuss the Trilio solution in person. Please reach out to us at [email protected].