« Back to Resources

RTO and RPO

It can be a struggle to determine and quantify an acceptable level of risk for your organization, and disaster recovery is no exception. RTO, which stands for Recovery Time Objective, and RPO, which stands for Recovery Point Objective, help reduce the ambiguity at an operational level. So you have a pragmatic framework you can use to plan and execute.

However, it’s common to focus on the “hows” of recovery: the technologies you’ll use, their features, and the extent to which they can recover entire systems in one click. As a result, you spend less time discussing the organizational ramifications—how much data to retain and how quickly it can be restored. By focusing on RTO and RPO, you can ensure that these important aspects don’t go overlooked.

But what is the meaning of RTO and RPO anyway? What’s the difference between them and how can they help you recover fast when you need it most? Let’s take a look.

How Do You Define Recovery Time and Recovery Point Objective?

RTO and RPO can help you define your business requirements, which will differ between applications, and measure how well your data protection solutions can satisfy them. In addition to RTO and RPO, there are some other helpful terms to know. We’ll break each one down below.

RPO Definition

The meaning of RPO focuses on how much data your business can afford to lose, measured in time (e.g., one hour worth of data). If a production system is impaired by data loss or data corruption, you can recover by reverting to a backup. So RPO defines how far back you are willing to go, accepting loss of all the data beyond the latest recovery point. As you can see, it’s an important part of your disaster recovery plan.

This, in turn, determines the granularity, or frequency, of your point-in-time copies. If your tolerance for data loss is low, you will need backup more frequently and often dedicate a larger amount of storage to house these backups.

RPO comes directly from business requirements. As different applications within your organization carry different business value, RPO is fundamentally an application-specific attribute.

In determining RPO, you should consider the risk of faulty backups. One faulty point in time doubles the achievable recovery point between the two adjacent points in time. By regularly testing your backups, you can ensure that they are recoverable when needed. Even misconfigurations and lapsed licensing can wreak havoc on efforts to return to full production.

RTO Definition

So what is the meaning of RTO? RTO is about how much time your organization can afford to lose after a disaster strikes until you’re back in business. Generally, this relates to the entire time it takes to operationalize. Depending on your disaster and protection scenario, there are multiple factors to consider, many of which are often overlooked.

  • Disaster declaration: Who is authorized to declare a disaster and commence recovery? What are the measures they must take before the red button is pushed?
  • System setup: If production site is damaged, how much time does it take to set up an operational system at a secondary site?
  • Recovery execution: How long will it take to get the right people to execute recovery?
  • Backup access: How much time would it take to gain access to the backup data? Is it online or does it require physical travel? If it is stored on a remote site, how do you connect if your primary site is down?
  • Transfer: How long does it take to transfer the data? If data is stored in the same site, transferring a 100GB dataset over a modern 10GbE network takes about 1.5 minutes, and nearly 15 minutes over a 1GbE network.
  • System restart: Take into account the time it takes to restart servers, launch applications, and load the data into production.

It’s important for you to analyze how your recovery process is impacted by various activities so you can establish a realistic RTO.  If you don’t plan these processes correctly, you’ll end up organizing and defining an action plan in real-time. And that means your actual recovery time won’t meet the designated objective.

What is the Difference Between RPO and RTO?

Before we dive any deeper, it’s important to spell out the differences between these two terms. RPO focuses on data loss, while RTO focuses on application downtime and how long it takes to become fully operational after an outage. While they’re related and measured similarly, they focus on two important, but different, aspects of disaster recovery.
Now, let’s take a look at some other terms that come into play for both RPO and RTO.

Technical RTO (TRTO)

You can zoom in to correctly identify your technical RTO. This refers to the amount of time consumed within the boundaries of your data protection solution. Steps in this phase may include:

  • Spinning up a new set of VMs hosting the application.
  • Configuring the VMs correctly and establishing communication.
  • Transferring the data from the backup medium to the production storage system.
  • Launching the applications and loading the recovered data.

The benefits of relaxing TRTO must translate to cost savings, for example, by auto-tiering the backup data storage from SSD to spindles.

When referring to RTO, keep in mind the difference between overall recovery process and the technical recovery phase. Have a crisp definition of your TRTO and make it clear which RTO you are referring to.

Retention Period

Retention period is the duration your business requires data copies to be stored until they may (or must) be discarded.  Like RPO and RTO, retention periods are application-specific.

In addition, the business value of data often decreases with time, becoming less valuable the older the data gets. Retention requirements may therefore be respectively reduced. The business requirements may be captured using time tiers as defined below.

Service-Level Agreement (SLA) Tiers

Given the changing requirements over time, Retention Periods, TRTO, RTO, and RPO are specified using time tiers. This may be referred to as SLA Tiers (SLA is, unfortunately, an overused term). For example, an organization may require the following tiers for their MySQL application:

Timeframe
TRTO
RPO
Impact
<24 hours 5 minutes 1 hour For running 24 hours, your business application will be operational within minutes, utilizing hourly point-in-time backups.
1-30 days 1 hour 1 hour For the rest of the month, you want to be able to complete technical recovery within an hour to the nearest hour.
30+ days 24 hours 24 hours For anything older than a month, you will be able to restore a point-in-time from a particular day from your archiving system within 24 hours.

Consider specific use cases to help you define these requirements. The first tier in the above example addresses a common data damage scenario such as when a VM is accidentally deleted, a file or folder has been overwritten, or a database has been corrupted. Different from a disaster scenario, all systems are operational.

Your business can sustain some data loss but requires minimal downtime. For this scenario to be practical, there must be no operational overhead. This means that end users (tenants in a cloud environment) must be able to execute the recovery on their own, without administrative assistance.

Less demanding SLAs allow for cost reduction through utilization of slower, lower cost storage mediums or facilities. Your organization must become comfortable with the trade-offs of losing more data or having longer down-time after a disaster strikes.

How Do RPO and RTO Aid in Data Recovery?

RPO and RTO are important components of your disaster recovery plan because they provide your organization with benchmarks around acceptable amounts of data loss and downtime.
In a perfect world, the second your application goes down, you’d be able to immediately recover all of your data and be operational again. However, that’s not reality. So you need to decide what those acceptable levels are without impacting your business objectives. They will differ based on your applications, their importance to your business, and also on your own company resources.
That’s what RPO and RTO can help you do. RPO will determine the amount of acceptable data loss, while RTO determines the amount of acceptable downtime. Both goals, measured in terms of time, can help your organization determine backup frequency, type, and tooling.

The Impact of RTO and RPO

One of the (many) reasons Trilio supports tenant-driven recovery workflows is because it allows organizations to trim their RTO while still mandating SLA-driven RPOs at a management level. This balance allows administrators to define a protection schedule and policy for each workload, but gives your tenants control to manage and restore point-in-time backups without requiring intervention.

Defining RTO and RPO helps to strike a balance between disaster preparation and cost efficiency, while promising critical data availability that’s needed to run your business. Data loss may occur even when the infrastructure is uninterrupted, and preparedness is yet another tool at our disposal to limit and mitigate the potential negative impacts of unexpected data loss.