Fifteen years ago the prototypical IT disaster recovery (IT/DR) exercise was preceded by months of meticulous preparations and took place over an extended period of time in the equivalent of a hermetically sealed bubble. Boy, have things changed.
Related on MHA Consulting: Testing, Testing: Our Best Blogs on BC Testing and Mock Disaster Exercises
The other day I came across a blog I wrote in 2009 that set forth a timeline of recommended steps organizations should take as preparation for conducting an IT/DR test, with the earliest coming a full three months before the exercise.
My suggestions included “Start formal planning” and “Validate schedule of testing with your alternate site” at 90 days out; “Review exercise with IT management” and “Validate the alternate site will be fully operational and configured” at 60 days; and “Finalize all system and application recovery plans” and “Finalize offsite storage needs and schedule for transport to alternate site” at 30 days.
Reading that now was like looking at a message in a bottle written a hundred years ago.
The world of IT/DR testing has changed dramatically since those days of elaborate preparations, quarantined testing environments, and a focus on restoring lost data centers (DCs).
Back then the most common testing scenario was total or partial loss of a DC, a concern that has faded significantly for most organizations in our current era of greater mitigations, improved fire controls, and cloud computing.
Also striking in those days was the amount of effort we used to put into “acing the test,” even though the artificially perfect environment we created to ensure we did so would not exist in the event of a real disruption.
These tests might have boosted the egos of the people involved (due to our rigorous preparations, it was almost a foregone conclusion we would pass with flying colors). However, they were were not very good at revealing gaps in the organization’s real-world recovery capability.
Another interesting aspect of testing in those days was our assumption that people would stay focused for months of preparation culminating in a multi-day exercise. Anyone who counted on that now would be in for a rude surprise as the Gen X participants they were depending on lost interest and drifted on to other things.
IT/DR testing is still alive and well; however, these days it has evolved toward what you might call a “quick and dirty” approach. Quick because contemporary exercises place a strong emphasis on brevity in recognition of the new reality of employees’ shortened attention spans. Dirty because modern testing deemphasizes preparation and focuses on making exercises adhere as closely as possible to real-world conditions.
Among the other new aspects of contemporary IT/DR tests is a new respect for the benefits of tabletop exercises. Necessity is the mother of invention, and the necessity of letting go of the traditional multi-day exercise has been driving productive innovations in the design and execution of tabletops. (MHA’s Richard Long has been a pioneer in this area, with his one-hour exercises focusing on a particular app or IT service and requiring participants to think on their feet.) These innovations have unlocked new powers in the tabletop in terms of identifying gaps and training staff.
Other contemporary innovations include a focus on varying levels of testing complexity, the use of multiple strategies, the rise of tiered testing, and the development of methods to test today’s hybrid apps.
Testing is no longer just testing anymore, and—as so often happens—the need to adapt to change has yielded tons of new findings and opportunities. But while IT/DR exercises have evolved a lot in the 15 years since I wrote that post setting out a timetable for organizing one, in one key respect they have not changed. They are still the only way of ensuring that your recovery strategy and plans are truly functional.
Fifteen years ago, as highlighted by that old blog of mine, IT disaster recovery testing was all about meticulous preparations, quarantined environments, and multi-day exercises. However, a recognition of the limitations of that approach, combined with changes in technology and society, have pushed the old methods to the brink of obsolescence.
The past few years have seen a shift toward “quick and dirty” testing methods that emphasize speedy assessments of recovery capabilities and the replication of real-world conditions, as well as incorporating promising innovations in the design of tabletop exercises. One thing that remains unchanged is the importance of testing for ensuring the functionality and effectiveness of recovery strategies in today’s dynamic IT landscape.