How Scripting OpenStack Backups Cripples Your Recovery Efforts

Organizations in the throes of an OpenStack deployment often come to the realization that there is no viable data protection options for OpenStack. As production dates approach and SLA requirements loom, many teams resort to scripting OpenStack backups as a stopgap solution. This kind of approach is deeply flawed and makes workload recovery nearly impossible. Here’s why.

An (Extremely Brief) History of OpenStack & Data Protection

Nearly a decade ago, when OpenStack was first released, the prevailing thought was that the platform would be used for largely stateless workloads. No need to back those up, right?

But as the technology grew — and organizations adopted it as the backbone for their private and public clouds — the use cases expanded, too. OpenStack clouds quickly grew to house a mix of new and legacy applications with both stateless and stateful workloads.

The community rushed to address this. Murali Balcha and some colleagues created the Raksha project, which focused on providing non-disruptive and scalable data protection for OpenStack. It was the start of something promising, but needed to grow and adapt in order to accommodate enterprise cloud requirements. (Trilio grew from this project to form the only native data protection solution for the OpenStack platform).

Other projects like Freezer and Karbor popped up with similar attempts to address the issue, but each fail to meet basic enterprise requirements for protection. Freezer uses point-in-time snapshots but can only backup file systems, and Karbor is complex, offers no support model and requires command-line work since it lacks a self-service GUI. Neither solution addresses the robust recovery and migration capabilities that most modern businesses demand from their OpenStack cloud.

The Nature of OpenStack Workloads

OpenStack workloads are groupings of VMs organized into tenant environments. These workloads leverage distributed resources across OpenStack but share policies and rules to facilitate easier management. In order to capture the full workload, the application data and metadata must be captured as well.

Unfortunately, data protection solutions that exist capture only the data, which significantly lengthens RTO in the event of data loss, since IT teams would have to manually piece together VMs, applications, and tenant environments.

Why Scripting OpenStack Backups Only Makes Recovery Harder

In order to address the workload capture issue, some teams have implemented scripting, cron jobs, and array snaps in order to capture all the necessary data to piece together a workload. For OpenStack, that means individual scripts must regularly back up each of these components:

  • Neutron – networking information
  • Nova – VM information
  • Cinder – volume configurations
  • Glance – images

Even more troublesome, these scripts must run against EACH VM, and still require full backups semi-regularly in order to maintain the integrity of the data.

And scripting OpenStack backups is no easy feat: consider a small workload with 2 VMs. Here is an example of the steps one would need to take to accomplish a backup of a full workload using existing OpenStack APIs:

  • Pause VM 1 and VM 2
  • Detach Storage Volume 1 and Storage Volume 2 from respective VMs
  • Snapshot VM 1 and VM 2 and store on Glance
  • Call Cinder backup APIs to back up Storage Volume 1 and Storage Volume 2 to Swift
  • Keep track of these copies in an Excel sheet
  • Attach Storage Volume 1 and Volume 2 back to VM 1 and VM 2
  • Resume VM 1 and VM 2

These manual processes consume significant time and resources.

While manual efforts may successfully back up the data, it will be nearly impossible to recover it when needed: someone would need to Frankenstein the workload back together using stored data about the OpenStack distribution, VM flavors, networks, security settings, storage volumes, and more. File-level recovery isn’t even an option.

The unsurprising result of this mess is often complex management processes that result in missed SLAs and put compliance at risk.

TrilioVault Eliminates the Need for Scripting OpenStack Backups

When disaster strikes, time is of the essence — so data recovery must be simple and fast. TrilioVault makes it easy for administrators and tenants to restore a point-in-time backup to its original location or to a new one.

With Trilio, an administrator can test and restore an application during an outage rather than pulling in a team of administrators to orchestrate the recovery process. Each tenant has the flexibility to restore their entire tenant environment, or only the VMs and workloads they need.

Only TrilioVault captures complete cloud workloads using the native Cinder snapshot API:

  • Applications
  • Operating system
  • Network configuration
  • Networks & subnets
  • Storage volumes
  • Security groups & users
  • VMs (single & multiple)
  • Metadata & data

TrilioVault’s agentless data protection software captures changed blocks in an incremental-forever fashion, is never disruptive, and empowers your users with self-service control. Learn more about TrilioVault here.

Take a Deep Dive on OpenStack Backup