OpenStack Backup and Recovery Requirements
Editor’s Note: This article was originally published in May 2014 and has been updated for accuracy and comprehensiveness.
You don’t think much about OpenStack backup requirements when it comes to the cloud. After all, you build your applications on ephemeral storage and compute solutions. These resources are not expected to persist across reboots and power cycles, and your application is built against these failures anyway. You create your workloads on-the-fly from the persistent store (such as object store), perform your computation, and save the results back to persistent store. Life is beautiful.
Unfortunately, only a small subsets of applications are written ground-up to fit the cloud paradigm, and you need an army of smart people to build your applications for the cloud. However, the cloud paradigm is here to stay. Its elasticity, scalability, and self-service aspects are universally appealing to many IT managers who are actively looking to host their traditional IT applications on open source cloud platforms such as OpenStack. These applications need to be persistent across failures and OpenStack backup and recovery is an important strategy for their business continuity.
The OpenStack Market Has Been Underserved
Just as with any service for the cloud, backup and recovery service must enable tenants to define data protection policies for their workloads. Likewise, IT managers are looking for a scalable solution that can grow with their cloud. However, traditional solutions are built to manage tens, perhaps hundreds of applications. These solutions are centrally administrated and the backup administrator usually has intimate knowledge about the workloads he/she managing. Such solutions are not a natural fit for the cloud, and hence there is a need to build a solution from the ground-up that shares the same attributes as your cloud.
Until recently, OpenStack backup hasn’t gotten much attention. But OpenStack has been gaining popularity as the cloud of choice for IT managers who like to build their own cloud on-premises. It does support a few APIs in Nova and Cinder to backup VMs and storage to Swift, but they are short of providing a comprehensive OpenStack data protection solution.
Consider a simple workload. In order to perform a regular backup of a workload using existing OpenStack APIs, one has to perform the following steps:
- Pause VM1 and VM2
- Detach Storage Volume1 and Storage Volume2 from respective VMs
- Snapshot VM1 and VM2 and store on Glance
- Call Cinder Backup APIs to backup Storage Volume1 and Storage Volume2 to Swift
- Keep track of these copies’ URIs in an excel sheet
- Attach Volume1 and Volume2 back to VM1 and VM2
- Resume VM1 and VM2
- Repeat above steps, as needed
Creating OpenStack Backup Requirements
This is clearly not an effective OpenStack backup and recovery solution. Neither are legacy data protection solutions — they were built for the old world of bare metal servers or purpose-optimized virtualized environments that cannot keep up with the demands of OpenStack.
What you need is to be able to go back to a specific point in time and quickly and reliably restore your entire workload to the desired state, whenever you need to. A native cloud backup solution that has been built specifically for OpenStack clouds will not only enhance performance, but also reduce time spent on management activities and make data easier to backup and restore.
When evaluating solutions, you should include the following traits in your OpenStack backup requirements:
Agentless
Agent-based solutions are clunky and significantly complicate maintenance, requiring re-installation with every added or changed resource. Look for agentless solutions that do not require additional servers or resources to provide data protection.
Tenant-Driven Backup and Recovery
Just like any other service in the cloud, backup and recovery service must present easy-to-consume policies that tenants can choose and apply to their workloads as needed. Rather than limiting these activities to backup experts (and watching the help desk tickets pile up in the process), OpenStack tenants should be able to log into Horizon and administer their own backup policies at their discretion.
Self-Service
OpenStack provides all services to tenants as self-service, and administrators have usually no access to a tenant space. Look for a solution that extends the self-service functionality to include backup administration, providing users with full control of the data protection of his or her tenant space without the need for a backup administrator in the middle. This takes the burden off of the backup administrators, giving them back time that they can use to further improve data protection service as a whole.
Scalable
OpenStack as a software solution is designed to use scale-out architecture. To have more resources available, you can simply add a new server to the OpenStack environment rather than increasing a single server. A backup solution should utilize the same scale-out architecture by adding service components that match the growth of the OpenStack environment that needs to be protected. This allows OpenStack to grow as needed, without having to consider data protection limits while planning your deployment. It further allows you to add resources as you need them, rather than anticipating them upfront.
Non-Disruptive Backups
Backup processes must not disrupt running workloads. Legacy data protection solutions often require significant downtime in order to perform updates or take snapshots of the OpenStack storage volumes. Try to avoid this where you can. The backup process must be non-intrusive for running workloads with respect to availability and performance.
Instant Restore
Cloud workloads can be huge, and the recovery of a workload from the backup must be as quick as possible. Waiting for the entire dataset to be copied from backup media to production will severely impact the recovery SLA of the service. Instead, look for a solution that provides instant restore capabilities.
Backup/Recovery of Single and Multi-VM Workloads
Cloud workloads can have thousands upon thousands of workloads spanning multiple VMs that are using a few dozen variations of network and storage configurations. But legacy backup solutions are still VM-centric: they’ll only capture a single VM or data volume. Look for a solution that captures both VM data/metadata and critical system information like OS, networks, applications, and more. You’ll want your backup and recovery capabilities to support this level of granularity.
Validate Backups
This is another feature that cloud backup can implement using on demand cloud resources. Backup processes must provide a means for tenants to quickly replay a workload from a backup media, allowing them to periodically validate the efficacy of the backup.
Efficient Data Transfers of Backup Images
Incremental backups and performing dedupe at the source significantly improves the backup process. Look for this key capability in your cloud backup and recovery solution.
Disaster Recovery
Backup service must include a disaster recovery element, too. Cloud resources are highly available and periodically replicated to multiple geographical locations or virtual locations. Replication of backup media to multiple locations will enhance the backup process capability to restore a workload, even in the event of an outage at one geolocation. This will be critical, should an incident occur within your cloud.
To learn more about how TrilioVault fits these key OpenStack backup requirements, visit our product page.
Just a stupid question, why you need backup for HA cluster? Distributed storage is replicated, all configuration usually should be automated.. There\’s basically nothing to backup. cause everything is replicated.
Hi Holms,
Few people asked the same question so I thought I will post a blog on this. https://www.triliodata.com/scale-databases-backup/
You comments are welcome.
Regards,
Murali Balcha