RTO and RPO are two of the most important concepts in business continuity and IT disaster recovery. Today’s post will explain what they are, why they matter, and how to use them, illustrating their use with straightforward examples.
Related on MHA Consulting: All About RTOs: What They Are and Why You Have To Get Them Right
The concepts of recovery time objective (RTO) and recovery point objective (RPO) are critical in developing a solid business continuity management (BCM) and IT disaster recovery (IT/DR) program. Let’s define them:
Recovery time objective (RTO)
Relates to business processes and their supporting applications. The maximum length of time that a business process and its associated applications can be unavailable following a disruption in order to prevent an unacceptable amount of impact.
Example: If a company’s RTO for its online customer sales process is four hours, it means that the organization must recover and resume its online storefront within four hours of a disruption.
Recovery point objective (RPO)
Relates to technical processes. The RPO for a given process is the amount of data from it, as measured in time, that can be recreated manually following its restoration after an outage.
Example: If a company has an RPO of one hour for its customer database, it means that after a disruption, the organization can only afford to lose up to one hour’s worth of data, and the recovery process must restore the data to a state that is no more than one hour old from the time of the disruption.
Both RTO and RPO are essential components of an organization’s business continuity and disaster recovery planning. They help determine the necessary strategies, resources, and technologies required to ensure the continuity of critical business functions and minimize the impact of disruptions.
The organization determines the RTOs of its key business processes and their supporting applications through an analysis of its needs. It determines the RPOs of its key technical processes through an analysis of its capabilities. (See below for a more in-depth discussion of how RTOs and RPOs are determined.)
Any gaps between the RTO and RPOs relating to an essential business process must be addressed by business continuity plans and strategies.
A process can have a short RTO and a long RPO or vice versa. Alternately both the RTO and RPO can be short or long.
The following examples illustrate these possibilities:
Accounting: Long RTO, short RPO. In most organizations, general ledger (GL) accounting is a business process with a fairly long RTO, typically several days. This is because, if the accounting process is disrupted, it is usually a matter of quite a few days before the outage has a serious impact. However, the RPO of the technical and data side of the accounting function is very short.It might be four hours, but it could be as short as zero. This is because it’s virtually impossible to recreate accounting data after the fact.
Public-facing website: Short RTO, long RPO. This is your typical Company.com website providing basic information to the public. These typically have a short RTO because if the site goes dark it can immediately attract negative attention and undermine the company’s reputation. However, the RPO for the site is generally fairly long—e.g., 24 hours or more—because the information on such sites tends to be relatively static and any updates that are lost can recreated fairly easily.
Storefront website: Short RTO, short RPO. The company site that takes orders, tracks stock, and so on. This function has a short RTO because when such a site goes down, a meaningful impact on the company’s revenues and reputation can begin almost immediately. The function has a short RPO because the information in the system changes quickly and there’s no way to recreate it if lost.
Policy and standards oversight: Long RTO, long RPO. This process is important over the long-term, but an outage of a few days is unlikely to have a serious impact on the organization. Hence the long RTO. And while policies and standards do change from time to time, the rate of updating is generally slow and losses of data of up 24 hours could most likely be recreated with little difficulty. This means the technical and data processes pertaining to this area will have a long RPO.
The RTO for a given business process and its supporting applications is arrived at through an analysis of the company’s overall operations and prioritization by staff. The question to ask in determining an RTO is, how long can the process be down before the impact on the company becomes unacceptable?
The RPO for a given application is determined by identifying how much data from the application the staff could manually recreate. As mentioned previously, this is measured in terms of time (e.g., up to two hours’ worth, up to eight hours’ worth, and so on).
Manually recovering the data means recreating it by various methods such as reproducing it from memory, locating it in other applications or in hard copy, or contacting customers and asking them to resubmit their orders.
Knowing the RTOs and RPOs for the processes and technologies used across your organization helps you understand how you need to protect both processing and technology needs. Knowing these metrics helps ensure that your strategies, implementation, and plans are neither overly aggressive (wasting resources) or inadequate (providing insufficient protection).
Every organization must devise its own scale of RTO and RPO categories. It is best to limit the number of categories to around five or six. More can be a maintenance nightmare.
The following is a scale of RTOs that we have seen work well for many organizations:
RTO 0 | Immediate/high availability |
RTO 1 | < 8 hours |
RTO 2 | < 24 hours |
RTO 3 | < 72 hours |
RTO 4 | < 5 days |
RTO 5 | > 5 days |
And here is a scale of RPOs that many organizations have used successfully:
RTO 0 | Zero data loss |
RTO 1 | < 4 hours of data loss |
RTO 2 | < 12 hours |
RTO 3 | < 24 hours |
RTO 4 | > 24 hours |
Once a company devises its categories, each of the its key business processes are analyzed and placed into an RTO category and an RPO category. These designations guide the subsequent development of the company’s recovery plans and strategies.
How does a company go about determining the RTO and RPO categories for its processes and applications?
The BCM office should develop proposed categories for RTOs and RPOs based on the organization’s known risks and needs. In doing this, the IT team can be a good place to start. The BCM team should make note of the times IT uses for its current protection and recovery strategies. Using those values, the BCM office can make adjustments based on discussions with management to understand the general times departments would need to be recovered.
After the categories are defined, the organization should perform a Business Impact Analysis. Making the best choices depends on factoring in information and insights commonly held across many different levels within the organization.
The final decisions regarding RTOs and RPOs should emerge after the BIA. Once defined, those proposals should be submitted to upper management for review.
Throughout this process, the BCM office has the job of educating others, facilitating the discussion, seeking consensus, and obtaining the necessary approvals.
Every organization should review its RTOs and RPOs on a regular basis. This is because organizations and the environment change. A company that has outgrown its recovery plan has no recovery plan. It is critical that RTOs and RPOs be kept up to date.
RTOs indicate how soon after a disruption a given business process and its supporting applications must be restored to prevent an unacceptable impact to the organization. RPOs are a metric of how much data from a given technical process, as measured in time, can be manually recovered in the event of an outage.
RTOs and RPOs for key processes and technologies are typically determined through a collaborative process led by the BCM team and calling on the judgment and expertise of people from across the organization. Once determined, the two types of objectives become cornerstones of the organization’s business continuity and IT/DR program.
For more information on RTOs and RPOs and other hot topics in BC and IT/disaster recovery, check out these recent posts from MHA Consulting and BCMMETRICS: