My initial response in my new job was probably typical. We upgraded the configuration specified in the contract, priced it out, and gained approval for a new hotsite contract and network failover capability. However, questions that had bothered me for years about disaster recovery (DR) started nagging at me again as I thought about the broader implications of our strategy.
I wondered what would really happen with my company in a disaster scenario, and I knew that just having systems and data backed up and being able to fail over to an alternate processing site was not going to be nearly enough. We would face many other serious issues in a real catastrophe. For example:
- Where would business unit staff go to actually perform their work if our corporate offices were unavailable? They would need adequate space, workstations, support materials, records, equipment, etc.
- How much time would the various operating departments waste before figuring out what their top priority functions should be, and who should perform them? And what if critical people were unavailable to do their jobs? Who would replace them?
- Did people have adequate ways to contact each other in an emergency, and were contact lists up-to-date and complete? Was there a planned calling sequence so that senior executives and key business partners would be contacted immediately?
- What about the company’s regional offices and distribution centers around the country? I had no idea what plans, if any, they had in place if they were affected by a disaster.
- Although our IT strategy was to restore our entire production environment, I wasn’t sure that was necessary, or if we could save money by only restoring certain parts of the infrastructure.
- I also didn’t know if the 48- to 72-hour recovery window we had planned was really adequate for certain applications.
Thus began my true education in not only disaster recovery, but in business continuity planning (BCP), a more comprehensive approach to addressing all of these questions in a logical and effective sequence. It led me, years after I left this CIO job, to specialize in BCP and return to this company as a consultant to complete what I had started.
The Catch 22
CIOs are often put in a very difficult, sometimes “Catch 22” situation regarding DR. It is fundamental to the IT management discipline to address backup and recovery, and this often has to be done quickly and prior to addressing business continuity as a whole. The CIO must make a quick assessment of business needs, often in a “vacuum” without adequate business input. Then, when basic IT DR is in place, the business just considers it an IT responsibility, and it’s very difficult to get the organization to revisit business continuity in a comprehensive manner.
The difficulty stems partly from a natural reluctance to address issues that may be important but are not urgent or an immediate threat to the organization. Disaster recovery isn’t urgent, of course, until there is a disaster, but it is critical to make plans and prepare before a disaster happens. Just as with any systems initiative, however, DR should be driven by business requirements. Strategies and recovery time objectives (RTOs) have no basis other than what is required by the organization as a whole. And what is required by the organization must be based on risks and potential business impacts, as well as the high-level strategy that senior management wants to employ.
My nagging questions never really disappeared during my five-year tenure as CIO in this particular company. Despite being a vice president and a member of the senior executive team, I could never get the proper attention focused on business continuity planning. A number of senior managers saw its importance and were willing to make some investment; however, another “gotcha” stood in the way. Without clear sponsorship from the very top (in this case from the international owners of the company), the necessary collaboration among all the important functional areas was not possible. And, without considering all of the interdependencies in a large company, some critical functions often end up missing from a plan.
A Happy Ending – Doing Things Right
More than 10 years later, I returned to this company as a BCP consultant to help them perform business continuity planning the right way. The IT DR plan had been in place for some time and had undergone many revisions, but much of the other necessary planning had not been addressed. Fortunately, an executive mandate by the CEO (driven by external auditor pressure and sanctioned by new owners of the company) got things going. I have been able to accomplish what was so difficult years ago when dealing from a strictly IT perspective, and the process was very straight-forward:
1. It was necessary to assess what risks the organization faced, both internal and external, and estimate business impacts. Impacts turned out to be both significant potential dollar losses, as well as intangible impacts that could eventually result in dollar losses. These provided a gauge of what the organization should spend on DR and where it should focus its efforts.
2. Based on the assessments of risks, key people from all departments that could be significantly impacted by a disaster were selected to attend a meeting where we presented the “big picture” of business continuity planning. As they listened, they came to realize that they would be responsible for recovering their business units in case of a disaster, and that they would have to think about what functions were critical for them. Department representatives were then tasked to start developing their own plans.
3. At the same time, an emergency response team was selected to work on plans for emergency and business recovery centers. They were also to re-examine building evacuation plans and other life safety measures such as CERT training and stocking disaster supplies. This was to be done in conjunction with developing company policies to address:
- Emergency pay
- Emergency leave
- Employee assistance programs such as emergency transportation, emergency housing, and employee financial assistance
- Emergency purchasing
4. Emergency response teams were to be designated for all of the outlying locations and tasked with developing plans and policies unique to their locations. Communication strategies and plans will also be set for each location, designating media spokespersons, updating communication lists, and determining various ways that employees and business partners would communicate in a disaster.
All of these plans are now coming together, and the drafts truly look like a comprehensive business continuity plan covering the entire organization.
Here is what I hope CIOs can learn from my experience and apply to their organizations:
Lesson 1: Let department heads and business unit leaders set critical priorities before planning IT DR to support them.
I made the classic mistake that most CIO’s make in planning for disaster recovery. I assumed many things about the business, for example, that my nagging questions were being answered elsewhere, and that my estimate of recovery time objectives were appropriate. In a real disaster, business unit and department heads will have the responsibility to restore their operations, and, therefore, they should determine what their priorities and recovery time needs are.
Today, with stronger scrutiny of controls and protection of information assets, IT gets a lot of attention from auditors, boards of directors, etc. regarding disaster recovery. It creates a perfect opportunity for IT to say, “We must know what the critical business priorities are before we set up an IT DR plan.”
This lesson is very much like the lesson system developers quickly learn: Determine and finalize requirements before you start programming systems. It is also a reminder of the classic maxim: Business should drive IT, not vice versa, because IT only exists for the sake of the organization’s goals.
Lesson 2: Don’t take on business continuity planning for the entire organization without an explicit charter and strong executive sponsorship. The term “business continuity planning” has now come to mean the whole process of recovery for an entire organization. I had fallen into somewhat of a trap by taking on planning for the entire organization without clear executive sponsorship. I tried to do this from the bottom up by going from department to department. This made me realize the cooperative effort required among departments in an organization, and the time and resources that must be committed to this task. I didn’t have much chance of getting people to make this commitment when BCP was looked at as an IT initiative.
Lesson 3: Make sure executive sponsorship includes executive involvement and buy-in with recovery strategies and priorities. It turns out we in IT had priorities wrong in terms of what we thought was important, and 48-72 hours was not adequate recovery time for some functions. This became evident afterward when business managers focused on recovery priorities and threw some curves about what was most important. It caused a trickle-down effect, changing a number of strategies for recovery, and causing several departments to change their plans.
Lesson 4: Push for the organization to plan in a comprehensive manner so major functions critical in a disaster are not ignored. Reports from real disasters show there are many other things people think about before they consider recovering business operations. For example, “Am I safe and is my family safe?” “Where am I supposed to go and what am I supposed to do?” “When and how will I get paid?”
Unless these concerns are addressed in a comprehensive plan, an organization will never begin recovering operations, or worrying about IT. A comprehensive plan needs to address the risks an organization faces, and the business impacts these risks could result in. Then it needs to cover:
- Emergency response and life and safety protection
- Situation and damage assessment
- Asset salvage and recovery
- Alternate facilities for emergency operations and business recovery
- Individual business unit and department plans
- Strategies for internal and external communication
- Strategies for plan testing and maintenance
- Supporting policies and procedures
IT is certainly an important part of planning, but it’s only a part of the organization. This company’s challenge now will be to keep plans current through regular testing and maintenance. I wish it could have happened 10 years ago, and, fortunately, the organization didn’t suffer any major disasters during that time. But now the company, the current CIO, and I can sleep better knowing they are prepared.
Michael Anzis, CBCP, CCP, is a director in the technology risk management services practice of RSM McGladrey, Inc. He is a former CIO for large, multinational companies, former Big 6 senior consultant, and holds a master’s degree in business information systems from UCLA and a bachelor’s degree from U.C. Berkeley.