The Disaster Containment Manager is in charge of making tough decisions, setting the recovery effort objectives, directing staff toward priorities, and keeping the Recovery Team focused. They are also the primary contact with public emergency services at a disaster site. The following list is Disaster Containment Manager responsibilities:
- Declaring that a disaster exists and identifying which outside assistance is required including the need to activate an off-site data center.
- Coordinating with any emergency services onsite to gain access to the site ASAP.
- Making an initial damage assessment and beginning planning for emergency containment.
- Selecting a site for the Emergency Operations Center by determining if the primary site is suitable or if a backup site must be activated.
- Activating the Disaster Recovery Teams, assigning people to either Business Continuity or Business Recovery efforts.
- Personally ensuring that adequate personnel safeguards are in place.
- Assigning staff to maintain a 24-hour schedule for containment and recovery.
- Maintaining the official status of the recovery for executive management.
- Coordinating incoming material with the materials receiving staff.
- Coordinating use of skilled trades with the facility engineering management such as from contract labor, electricians, welders, and millwrights.
- Assessing personnel strengths and weaknesses in terms of knowledge, skill, and performance to balance labor expertise and staffing.
- Watching for signs of excessive stress and fatigue.
- Identifying “at-risk” employees, such as those deeply affected by traumatic stress.
- Designating a backup person to assume the Disaster Containment Manager’s role while they are resting or not on the disaster site.
If an organization is providing its own recovery site, it is going to require regular maintenance in order to match requirements of the site it’s supporting. As the original facility is likely to change and evolve over time, so must the recovery center. The following is a list of maintenance items to be performed in the given intervals:
- Executives need to determine if the BIA has materially changed in the past year.
- IT will need to validate that computer hardware is adequate as well as meets corporate standards.
- Telecom support team should reevaluate if the telecom arrangements need to be changed.
- Each department should review desktop requirements with the IT team to ensure that computers at the recovery site still meet the needs of employees.
- The IT team will need to verify that the software in the recovery units is adequately patched with bug fixes and security patches.
Aside from these annual and quarterly maintenance checks, there should also be periodical tours of the facility in order to make sure everything is well kept and will be fully functional any time disaster may strike.
An Emergency Operations Center (EOC), sometimes called a “war room” is a physical place where all communications of the recovery effort are focused. It is the known place where all interested parties can report on the status of a recovery. It provides communication to stakeholders such as executives, general public, suppliers, and customers that are most likely external to the recovery process. It also provides administrative support to the recovery effort, such as public relations, safety, purchasing, and site security. Because there is not usually time or availability to announce where the Emergency Operations Center will be after disaster strikes, it is crucial for it to be “a known place” ahead of time. It should be a logical place where people would turn for information and/or assistance. A few options include the facilities security office, if available, or the data center’s help desk. The Emergency Operations Center has three essential functions:
- Command & Control – This is where you will find the person in charge of the containment and recovery efforts. They will set objectives and priorities and have overall responsibility at the incident.
- Operational Control – An hour-by-hour control should be exercised from here by various functional areas including security, HR, purchasing, communications, logistics, etc.
- Recovery Planning – (separate from emergency containment) will begin at the EOC but quickly transfer to its own office.
The Technical Recovery addresses who, what, where, when, why, and how to recover something – whether it be an IT system, data network, or other process. The following is a list of various dimensions of what must be known or possessed to recover a certain system or project in order to make trade off decisions and/or have a successful system recovery. Some of this information can also be complied and placed in separate directories for lists.
- Purpose – set the context for which the system provides value.
- Scope – what the system does and does not support.
- Background – explain business requirements that assist the reader in understand why the server/application/process exists.
- Assumptions – a list of things that were assumed when this plan was written (i.e. technical qualifications required for person executing plan, etc.)
- Dependencies – anything else that must be in place (i.e. specific database server, essential IT servers, etc.)
- Tech Support – the names and 24-hour contact numbers of primary and secondary support persons for the specified system.
- System Users – the primary end users for the system. They should be called to verify a system has been successfully recovered.
- Server Requirements – in terms of CPU, RAM, “C: drive” size and type, etc.
- Disk Space Requirements – the total disk storage required for local disks, SAN disks, etc.
- Connectivity Requirements – describes the network configuration (i.e. VLANs, trusts, opened firewall ports, etc.)
- Support Software – a list of supporting utilities that may be needed.
- Application Requirements – listed in case a software application must be changed during recovery.
- Database Requirements – the type and version of the database program supporting the system. This needs to include required permissions, databases, and table connections needed.
- Special Input Data – beyond what’s in the company’s backup media, such as data stored in a different off-site location or external data feed.
- Licensing Requirements – may be relevant since in some cases, loading a system on new hardware may require a license change by the software manufacturer.
- Special Printing Requirements – instructions for setting up printed output for an application to include special forms.
- Service Contracts – support the system’s components to include days and times of coverage, etc. Include the expiration date. Describe how to contact the vendor or whoever provides support. Information should be available through the command center and the administrative plan.
The criteria used for selecting an alternate site for your workers is similar to that of selecting an IT recovery site. It should be far enough away to avoid damage from the same incident. There are several issues necessary to consider when evaluating alternate site options. The following are the different options.
- Different Company Site – using a different company site can be fairly simple. The site should be close enough for people to drive to. Using a company site means you know it has an active network and telephone connections, security is already in place, and you can pre-position materials for emergency use.
- Contracted Hot Site – a contracted hot site can cause the least grief because you pay someone to take on all the maintenance. The terms of the agreement depend on the vendor and the level of service you hire, but typically it includes test time and a set number seats in a recovery. These sites should be close to public transportation and already have arrangements for local food and lodging.
- Mobile Recovery Equipment – mobile recovery equipment comes to the disaster site. These are expandable trailers that contain almost everything needed in a disaster site. It includes its own generator, telephone switch, and a satellite uplink for communications. The trailers are pulled to the customer site and activated when a disaster is declared.
- Scramble at the time of the Incident – some companies feel that the local real estate situation is such that buildings with adequate space and facilities can be found on short notice. There are many problems associated with this approach. One is that it leaves all planning to occur during incident, when there is already so much to do. Without a test site, the plan can’t be validated or the team members adequately trained. Also, it ignores that more is needed than four walls and a roof. Lastly it underestimates the time required to settle the real estate details even if everyone is pressing for an immediate resolution.
A successful Business Continuity Program awareness effort can result in a greater company-wide support for the program. It can also reduce the reluctance of people to participate. Awareness is an ongoing process and is best conducted in a “here and there” manner. Keeping the message fresh and relevant helps to maintain interest from everyone. The following is a list of ways to build awareness:
- Success stories in company newsletters
- Success videos on company TVs
- Posters reminding key points
- Discussions with departments
- Dedicated quarterly newsletter with FAQ or Q&A
- Company wiki or online forum
Business Continuity and Disaster Recovery Planning can generate a lot of employee interest. Harnessing awareness for the benefit of the program can provide valuable support. The easiest way to encourage positive energy is through a steady flow of program information.
Several roles must be fulfilled when the disaster first occurs in order to optimize the organization’s response. These roles include:
- First Point of Contact – usually the person who runs the facility’s day-to-day maintenance. This person can look at a situation and decide how to contain it until it can be repaired.
- Facility Manager – coordinates damage assessment, salvage, and restoration activities.
- Executive Team
- CEO – whoever makes the top business decisions
- CIO – makes the IT strategic decisions and advises top executives on IT recovery process
- Disaster Recovery Manager – advises executives on the Disaster Recovery Plan execution and process
- Executive Staff
- Corporate Communications Manager – coordinates with the news media to ensure accurate reporting
- Human Resources Manager – coordinates communication and notification to all employees
- Legal Team – coordinates with the insurance company to meet assessment needs while speeding the company’s recovery
- Purchasing Manager – quickly contacts suppliers and orders needed support
- Sales Manager – contacts customers to assure them of the timely delivery of their orders or to assist them in finding an alternative source of goods and services
A Business Continuity Program generates a lot of documents. Recovery plans, BIAs, Risk Assessments, and results of testing are a few of the numerous things that must be kept handy. Also, many people contribute and maintain such documents. It is necessary to store everything in a central place to be found when needed. There are a few options for this.
- Establish a file share with sub-directories to separate the technical plans from the public areas. This option is inexpensive and access permissions are controlled by the Program Manager.
- Use a document management product like Sharepoint. This can track who has which document checked out for updates.
- Purchase a purpose-built product like Strohl’s Living Disaster Recovery Planning System (LDRPS) which can be used to build an automated DR plan.
The challenge is to control access to plans so that the Program Manager can ensure the quality and accuracy of anything accepted for storage. Whatever tool an organization decides to go with, be sure to set aside a submissions area to receive proposed plans for review. To make these documents useful in a crisis, they must be available at the recovery site.