BCM Basics: Modern IT/DR Strategies - MHA Consulting

Written by Richard Long | Sep 18, 2024 7:53:57 PM

This post is part of BCM Basics, a series of occasional, entry-level blogs on some of the key concepts in business continuity management.

Organizations today have more options than ever when crafting IT disaster recovery solutions to protect their critical data and applications. However, the basics of choosing an IT/DR strategy are the same as they were 30 years ago: focus on your needs first and the technology second.

Related on MHA Consulting: Hit or Myth: 5 Common Misconceptions About IT/Disaster Recovery

Defining IT/DR Strategy

Before we get into the details, it might be worth reminding everyone that IT/DR is not about day-to-day availability. In fact, your IT/DR strategies and solutions should only rarely be used (unless they include your day-to-day resiliency and availability needs). If you are using your IT/DR solution even once a year, then you have an availability problem you need to solve. Most IT professionals will legitimately need to use their IT/DR solution perhaps once in their career.

The Three Main IT/DR Strategies

In today’s world, there are three main IT/DR strategies. They are:

- High availability (Active-Active). The environments you are protecting are always on in multiple locations. If one side fails, the other side automatically or very quickly (within seconds to minutes) gets up and operating. This is the fastest and most expensive strategy.

- Warm standby (Active-Passive). The environments are replicated and take minutes to hours to come up. This strategy is moderately fast and moderately expensive.

- Backup and restore. The environments (servers and data) are restored from a backup. This takes hours to multiple days due to the time needed to rebuild, restore, and test. This is the slowest and least expensive strategy.

(Note: There’s an exception to the statement above that the warm standby solution is faster than backup and restore. In some cases a warm standby solution at a physical data center can take roughly the same amount of time as a restore from backup in a cloud solution.)

The Three Data Location Solutions

Modern IT/DR can also be broken down into three main options in terms of where the recovery or HA solution exists:

- Alternate physical data center. The organization relies for its IT/DR needs on an alternate physical data center (located away from its primary DC). This alternate DC might be managed by the organization itself or through colocation.

- Public cloud solution. The organization plans to recover in the cloud and depends on a public cloud solution such as Amazon Web Services, Microsoft Azure, or Google Cloud Platform.

- A hybrid solution. This solution entrusts the organization’s IT/DR needs to a combination of a physical data center and a public cloud. This is typically the best choice for most organizations in today’s world (since most utilize the cloud but have some level of on-prem solutions, file servers, and small servers).

If you use a private cloud managed in your data center, this is still considered an on-premises solution, and the DR can either be via an alternate physical location or public cloud.

Each of the three IT/DR strategies (high availability, warm standby, and backup and restore) can be implemented at any of the three location options (alternate DC, public cloud, hybrid).

How to Choose an IT/DR Strategy

Today’s abundance of choices is a big change from 30 years ago, when there were only two of options (using an alternate data center or recovery center in which the system was always on and replication continuous, and the use of tape backups, in which everything had to be rebuilt from scratch).

One thing that hasn’t changed is the type of analysis an organization should make in choosing its strategy.

Organizations should choose their DR strategy based on their unique needs and capabilities. Specifically, they should look at certain key information about their business processes that has (ideally) already been identified through their business impact analysis. This includes the Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) of their most critically time sensitive business processes. It also includes the organization’s internal skillsets and resource capabilities. The organization should choose a DR strategy that is appropriate to its needs.

A strategy that exceeds its needs consumes resources that could be better used elsewhere. A solution that falls short means that, if there’s an outage, the organization could lose data in excess of the amount previously deemed acceptable by management.

Two Common Pitfalls

The richness and variety of the IT/DR options available gives companies unprecedented flexibility in crafting solutions that fit their circumstances. However, it also brings pitfalls.

One of the biggest of these is that sometimes companies get excited about a particular technological solution (whether its from Cohesity, VMWareTools, Veeam, SRM, Zerto, or whoever) and decide they want to use it even if it’s not the best choice for their organization. A better way is to first look at the needs of the organization and what is currently in use, and let that guide the choice of strategy and solutions.

The other big pitfall companies make in choosing an IT/DR strategy and solutions is they become a victim of analysis paralysis. The best way to avoid this is, again, look at the organization’s needs. Once these are clear in your mind, you can sift through possible options more confidently. Those that do not meet your key criteria should be crossed off the list. Finally, you can do worse than heed the saying, Don’t let the perfect be the enemy of the good.

Two Secondary DR Strategies

There are a couple of secondary DR strategies that are worth knowing about.

One is what might be called a replacement service strategy. This is pertinent where the organization relies on a Software as a Service (SaaS) solution such as Salesforce or Microsoft 365. For such tools, the organization might find it worthwhile to have an alternate provider of the same service lined up, in the event the primary service experiences an outage.

Another noteworthy secondary strategy is Disaster Recovery as a Service (DRaaS), where a vendor manages DR for you. They might do this in a colo space or in the cloud. To learn more about DRaaS, check out this post.

The Key to Creating an Effective IT/DR Strategy

Today, organizations have more choices than ever when choosing and implementing strategies and solutions to protect their critical data and applications. They can choose high availability, warm standby, or backup and restore, and they can site their recovery in an alternate physical data center, the public cloud, or a hybrid solution.

This abundance of IT/DR choices can be confusing and distracting. It also gives organizations more flexibility than ever to craft a strategy and solutions that meet their needs while also accommodating any constraints they might have in terms of resources, people, and time. The key to success in choosing and implementing an IT/DR strategy is focusing on the organization’s needs first and the technology second.