Disaster recovery planning can be expensive. For the most basic data center recovery plan, monies must be allocated for personnel (full or part time), alternate sites, network recovery and offsite storage. Further increasing the cost of recovery is the fact that the traditional role of disaster recovery planning has now evolved into corporate contingency planning. Plans are written so that not only will the data center be recovered but all vital business functions will be recovered. And if the recovery plan is to include more exotic recovery strategies such as electronic vaulting, the costs can positively skyrocket.
However, for any CEO or CFO who thinks contingency planning is a waste of money, two incidents clearly point out the necessity of a well thought out recovery plan: the August 13, 1990 Wall Street blackout and the April 13, 1992 downtown Chicago flood. In the Wall Street outage 28 firms relocated to hotsites, and in the Chicago flood that number was still higher: 33 firms. The Chicago Board of Trade, one of the world’s largest financial exchanges, closed down completely on the first day of the flood and affected all world financial markets because of the volume of uncleared trades. The most important fact for any executive to remember about both the New York and Chicago disasters is that the cost in dollars most frequently heard is “billions”. However, it will probably prove impossible to refine that estimate because corporations are reluctant to discuss their losses.
How does a corporation determine the money and resources to devote to contingency planning? The answer is the business impact analysis. This analysis should serve the following purposes: 1) identify the potential risks, 2) estimate the effects of a disaster on the organization, and 3) determine the requirements for a recovery strategy. The impact analysis should quantify the effects of a disaster as much as possible. Hard dollar figures with an emphasis on estimates of lost revenues and productivity will make a more lasting impression on management than a hazy and subjective analysis. In other words, management will sit up and listen if they are told that a salesman lost an order for $100,000 because he could not obtain price quotes when the central mainframe was down.
The impact analysis should be performed by auditors in conjunction with the contingency planning coordinator. By having auditors involved, senior management will become involved and pay more attention to the conclusions. If management intuitively senses that a major disaster would have a severe impact on the company, then funding for the impact analysis will be obtainable. Auditors can be internal or external; many hotsite vendors or accounting firms can perform a business impact analysis. However, if funding is not forthcoming, then the contingency planning coordinator may have to perform the analysis himself. This is a shortsighted management view, but if the coordinator is stuck with it, he will have to press on and convince management of the necessity of a good contingency plan.
The impact analysis should identify the key computer systems and business functions that are vital to an organization and specify how fast those systems and functions need to be brought back online so that business will not be severely interrupted. This means that the audit team will need to go out and talk to people in many different departments. Operations, sales and marketing, accounting and human resources are just a few examples of departments that rely heavily on computers and whose functions may affect mission critical functions in the corporation. Other departments, within a specific company, will also need to be identified. Senior management must also be interviewed to determine which departments within the company can impact overall business strategies most severely. In this age of “total quality management”, functions that can affect a customer’s confidence in a company must be identified. When all of this input is gathered, a relative ranking can be made as to the order in which different systems and business functions are brought back online.
The impact analysis should also identify the possibilities of different types of disasters. From recent events, it can be seen that Florida is vulnerable to devastating hurricanes and California can be brought to its knees by earthquakes. But try to find a corporation in the Chicago Loop that had a contingency plan for a flood. In general terms a company should prepare for the worst: the total loss of the building housing the data center or vital business function. With this in mind, the recovery coordinator can plan for all functions to be recovered and, in the event of a less serious disaster, extract the parts from the plan that are applicable. For example, many companies in Chicago were able to take backup tapes and office files from buildings that they were forced to evacuate. They could use these materials at the alternate sites and did not have to rely on offsite storage facilities.
Generally the business impact analysis will produce a graph like the one in figure one. This graph tells management that the impact of interruption increases exponentially with the passage of time. Conversely, the cost of recovery will decrease as the time required for the recovery increases. A balance must be found between the cost of a potential disaster and the cost of recovery.
In practical terms this means that if the impact of having the data center non-operational for 48 hours will cost the firm $50,000 in lost revenue and productivity, it does not make sense to spend $100,000 per year for disaster recovery. On the other hand, if the impact of a data center loss is $5,000,000 for the first ten hours, then a contingency planner is justified in requesting a $1,000,000 budget for an intra-day data center recovery using an electronic vaulting strategy. This same type of analysis must be used to assess the impact of the loss of other business functions. Normally the business function recovery will require an alternate office site, phone lines rerouted, extra forms, office supplies, etc.. and PCs on standby. The alternate site could be another close-by office or a business recovery facility provided by a hotsite vendor.
The business impact analysis can also save a company money. Not every computer application or business function is mission critical to a company. It is true that sooner or later a company will want to restore all lost functions, but since the impact analysis will rank them in the order of importance, a corporation may find quite a few functions that fall into the “later” category. This means that not all of the equipment at the primary site will have to be duplicated at the alternate site. A company’s primary site may have 600 Gbytes of DASD and an IBM 3090 600S processor; the impact analysis may show that for the company to survive in an emergency, it will only need 300 Gbytes of DASD and an IBM 3090 400s processor. This represents a considerable amount of savings in a hotsite contract. The same logic applies to the recovery of critical business functions. The primary office may have 50 phones and 30 PC’s, but in an emergency situation the staff may be able to get by with only 10 phones and 10 PC’s - again, there is a significant savings.
The business impact analysis is an important tool in contingency planning. It should identify exposures and recovery options and present them in business, not technical, terms. With senior management looking at all projects with an eye towards the bottom line, the business impact analysis is an important tool to justify the contingency plan. Utilized intelligently, the business impact analysis will be the key to selecting the best and most cost effective recovery strategy.
John Watkins is a Senior Disaster Recovery Analyst responsible for the intra-day recovery of Sea-Land Service’s data center.
This article adapted from Vol. 6 #3.