Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

Improving Recovery Time Objectives with Savvy Backup Strategies

Author

Table of Contents

Planning for disaster scenarios that could compromise your company’s digital operations is critical to protecting business continuity and managing risk. RTO and RPO are benchmarks against which your company can measure its disaster readiness.

Recovery point objectives (RPO) establish the volume of data that an organization can afford to lose. Recovery time objectives (RTO), in turn, establish the minimum time that an organization can wait before the completion of its data recovery process.

Also referred to as “maximum tolerable outage” or “maximum allowable outage,” your RTO is the maximum allowable amount of time that your systems can be down, after an incident, before normal operations must resume.

Although these two parameters are defined and agreed on by all stakeholders in an organization, savvy IT leaders seek ways to reduce their RTOs and RPOs through innovative processes and technology. Lowering these values brings about higher productivity, less downtime, reduced cost and lower risk of loss of consumer trust and credibility.

While your organization’s RPO is largely dependent on your backup frequency and completeness, there are many varied factors that impact the time it takes to get your environment back to a working state. Let’s take a look at actionable tips for lowering recovery time objectives (RTOs).

Offsite Disaster Recovery & Replication

As mentioned, organizations can immediately shrink their RPOs by increasing backup frequency. More backups mean more snapshots of an organization’s critical data, which in turn lowers RPO. Frequent backups enable access to more recent snapshots of your critical data, thus reducing the time needed to execute a recovery.

Redefine your cloud data protection with automated, scalable recovery.

However, your backup strategy can also improve your RTOs. Keep an off-site secondary copy of live data sets that you can instantly switch to in the event of a disaster. By storing the copy in an off-site server, your RTO is reduced to the time it will take to failover from this server. Since replication frequency determines your RPO, you can lower RPOs on this off-site server by replicating more often.

Using “Changed Block Recovery” Solutions

You can lower your RTOs by using incremental forever backup solutions with “changed block recovery.” Incremental backups are much smaller than full backups since they only backup the blocks of data that have changed since the last backup or the blocks of data needed to restore a workload to a given point in time. In the event of a disaster, these changed blocks can be automatically reassembled to form a synthetic full backup image. Solutions like Trilio also capture valuable metadata — including operating system, applications, configurations, and more — saving you the time and headache of piecing together workload snapshots from different sources. This reduces your total backup time and, by extension, your RTOs.

When using such solutions for a physical or virtual backup, further modifications to data blocks are continuously monitored via changed block tracking.

Location is Key

You can also lower recovery times by ensuring that your recovery media is in the same location or platform as your recovery/ failover servers. Although cloud backups come with a lot of benefits, having only copy of the backup data on the cloud and trying to execute a recovery on your on-premise servers will result in several challenges.

Downloading all your data may take days or even weeks. You should either keep a local copy of backup data on-premise (preferably offline) or recover your applications and workloads in the cloud.

Synchronous Mirroring

To achieve zero or near-zero RTO, organizations should leverage synchronous mirroring. This approach works by synchronously writing I/O from primary storage media to another mirrored system. The first system waits for acknowledgment from the second system before writing the next I/O set. The secondary copy is stored in an active state that enables immediate recovery — in essence, high availability in a dual-node clustered server.

Finding the Right Storage Backup

Your RPO and RTO metrics should influence your choice of backup infrastructure and data redundancy strategies. The lower your RPO and RTO, the more complex and expensive your backup infrastructure will be.

Are clunky NFS gateways slowing you down?
Learn how TrilioVault stacks up against this, and other legacy approaches

If your RTO is zero, it means that your business cannot afford any downtime. Leveraging a redundant IT infrastructure with off-site storage of replicated data or a high availability cluster for seamless failover might be your only choice.

In-Place Recovery

One of the best ways to improve RTO is by leveraging backup solutions (like Trilio) that can execute “recovery in place.” Also known as “boot from backup” or “instant recovery,” backup solutions with such functionality allow you to run data stores and servers directly from the backup.

That’s preferable to waiting for the failover to complete before you can access your systems and data. Although this is a relatively new functionality, it is the fastest way to achieve business continuity in the event of a disaster.

Rewards Exceed the Effort

Reducing recovery time objectives can have tangible benefits for your

organization. Although extra measures must be taken in order to achieve this, the rewards can far exceed the effort. Leveraging a robust data protection strategy and best-in-class data backup solution to protect internal and external cloud environments is a must for any organization dependent on availability and resiliency.

To learn more about incremental backups and in-place recovery with Trilio, get in contact with a team member today.