MHA Consulting Blog | Roadmap to Resiliency

Protecting Your Business: MHA's Best Practices for IT Recovery and Resiliency

Written by Richard Long | Jan 14, 2025 9:08:08 PM

In today’s world, achieving true operational resilience is impossible in the absence of effective IT incident management. In this week’s post we’ll look at some of the key best practices every organization should implement to protect their IT operations and improve their ability to respond to crises.

The Role of IT Incident Management in Ensuring Resilience

Functioning IT systems are the lifeblood of modern organizations, but these systems face a formidable array of threats, including cyberattacks, hardware failures, and human error, to name a few. The disruption of IT networks and applications commonly results in significant costs to affected organizations, with impacts to firms’ operations, reputation, bottom line, and legal and regulatory standing, among other areas.

IT incident management is the activity of responding to and recovering from disruptions to these systems, and no organization can be operationally resilient until it becomes proficient at this task.

IT Incident Management vs. Crisis Management

In reading about how to recover from IT disruptions, you might encounter the term “IT crisis management”; however, this is not a standard term. Properly speaking, crisis management refers to the managing of emergencies at the strategic level and involves senior corporate leaders and the representatives of multiple departments.

Within the IT department, the activity of managing issues and recovering from outages is called IT incident management. Such incidents can range from everyday problems such as the inability of one person to log on to so-called major incidents (e.g., a large quantity of data has been encrypted by an attacker and critical systems are inoperable across the organization). IT incidents are categorized by severity and can be escalated or deescalated as the situation evolves.

What we want to address is not the management of an event (crisis management) but the more important aspects of IT Recovery and Resiliency. 

Critical Best Practices for IT Recovery and Resiliency

Let’s take a look at some of the key best practices for IT recovery and resiliency, especially for major incidents, (i.e., those which would be considered a crisis from a business perspective). 

General lists of best practices for IT incident management are widely available. This list will be limited to a few key practices that, in the experience of myself and other MHA Consulting experts, amount to critical success factors for effective incident management and technical recovery. 

Organizations that follow these practices almost always enjoy high resiliency and swift recoveries. Companies that neglect them frequently suffer from delayed recovery and outsized impacts. 

Here are MHA’s three key best practices for IT incident management:

  • Do the preliminary work needed to make recovery possible.

The most crucial step in IT incident management is ensuring that the groundwork for recovery is in place before a disaster strikes. This means making sure your technology infrastructure is robust and your recovery capabilities are well-defined. This process begins by understanding your specific business needs, as outlined in Business Impact Analyses (BIAs), and translating them into clear recovery objectives. For example, defining your Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) for each critical system allows you to prioritize resources and efforts. This proactive approach is akin to checking that your car has a spare tire before you hit the road—without it, you’re stuck. Having a solid recovery strategy in place before an incident occurs significantly increases your chances of achieving a swift and effective recovery.

  • Determine the order of recovery ahead of time.

A common pitfall many organizations face during IT incidents is the failure to plan the recovery sequence in advance. Without a defined order of recovery, it’s easy to restore systems in a way that disrupts dependencies, leading to delays and malfunctioning services. It’s essential to understand which systems rely on others and to prioritize restoring foundational assets first. For example, databases should be restored before web servers because many applications depend on data from the database to function correctly. By mapping out the recovery order and understanding these interdependencies beforehand, you can avoid confusion during the critical moments of an incident, ensuring faster recovery. Having a clear, prioritized recovery plan in place helps you get started immediately and keep the recovery process on track, even if adjustments are needed along the way.

  • In completing new IT projects, make sure the recovery strategies for the new software are fully implemented.

As organizations roll out new IT projects and software, it's crucial to integrate recovery strategies into the development lifecycle. Many businesses recognize the need for recovery planning, but fall short in execution, typically due to cost concerns. This gap between planning and implementation can have serious consequences when a disaster strikes. To ensure resilience, recovery strategies must not only be developed but also fully executed during the planning and deployment phases. By addressing this implementation gap, businesses can ensure that new applications and systems are equipped for rapid recovery, preventing gaps in the overall IT continuity plan and aligning the recovery strategy with organizational goals from day one.

By prioritizing these best practices, organizations can ensure IT resilience, minimize recovery delays, and protect operations from disruptions, safeguarding their continuity and success.

MHA Consulting: Your Trusted Partner in IT Crisis Management

At MHA Consulting, we bring unparalleled expertise to help organizations enhance their IT incident management capabilities. With decades of experience working across industries and company sizes—from Fortune 100 corporations to small businesses—we have the knowledge and skills to guide you toward greater resilience and faster recoveries.

Our team specializes in conducting Current State Assessments and IT Disaster Recovery (IT/DR) Current State Assessments, identifying gaps and opportunities for improvement in your incident management and recovery strategies.

We also provide hands-on support in implementing the three critical best practices highlighted in this post:

  • Establishing a solid recovery foundation to ensure readiness for any incident.
  • Devising a clear order of recovery to minimize delays and dependency issues.
  • Developing and implementing recovery strategies for new IT projects to close potential gaps.

As a trusted industry leader, we are committed to helping your organization achieve IT resilience and operational continuity. For more information, contact MHA Consulting today to learn how our expertise can enhance your IT incident management, strengthen your recovery capabilities, and safeguard your critical operations against disruptions.

 

 

Further Reading