Risk Analysis--the foundation for a good Disaster Recovery plan! It is also a phase that means different things to different people.
For some, it means an analysis of the hazards or exposures that a computer room is subject to and the impact of such hazards on the company. Others break it down into two separate phases: the hazard analysis which they consider to be the risk analysis, followed by a business impact analysis.
Neither one is wrong as they are both achieving the same result--analyzing what can happen and what the impact will be. With two phases, there can be some duplication of effort. With a single phase, hazards and impact can be considered at the same time.
The risk management and insurance industry may have contributed to the two phase approach.
For example R. L. Mehr and R. A. Hedge’s book Risk Management Concepts and Applications provides the following two definitions for Risk Analysis:
- An analysis of an organization’s present situation with respect to risk management is usually called risk analysis. A more accurate title would be hazard analysis.
- To solve the problems of achieving objectives, it is necessary to identify the present loss exposures. In risk management this procedure is called risk analysis or more accurately, hazard analysis.
A different definition from a business standpoint, which also implies financial impact is:
- Risk Analysis--a methodology for examining possible future out-comes before approving an investment proposal, a new product or a future corporate strategy (Today’s Executive, Price Waterhouse, August 1985).
In view of the link between computer operations and the company’s operations, a risk analysis should consider not only the hazards, but also the business aspects. A simple definition would then be:
- Risk Analysis is defining what can happen to the data center, what will be affected by an outage in the data center, and what will be the impact on the company’s operations.
It should be of sufficient depth to answer these questions, but need not involve probabilities calculated to 3 decimal places or financial impact calculated to the last dollar and cent. The information developed should be in enough depth and in the format company management needs for decision making purposes.
A risk analysis should be an ongoing project. It should be:
- Reviewed and updated periodically.
- Included in the development cycle of any new computer applications or operations.
- Considered when making major changes to existing applications.
CONDUCTING THE ANALYSIS
How should a risk analysis for disaster recovery purposes be conducted? A suggested sequence is:
- Determine areas at risk
- Analyze loss potentials
- Determine vital functions/applications
- Determine emergency operating methods
- Determine minimum resources
- Determine recovery time resources
- Quantify impact
- Prepare report
DETERMINE AREAS AT RISK
Most people will think in terms of the computer room and the area immediately around it, but do not consider support services, utilities and contract services which are needed to maintain computer operations. The recent fire at the Hinsdale Central Office in Chicago highlighted the problems that can result from incidents at locations outside a company’s control.
The study should consider what the effect will be should an incident occur involving such areas as the computer area and the building housing it, both onsite and offsite utilities, offsite contract computer facilities and network systems.
ANALYZE LOSS POTENTIALS
Having determined the areas at risk, the loss potentials for these areas should be considered. Loss scenarios should be developed. When doing so, all types of loss that can occur should be considered. This will include physical damage, natural and environmental hazards, utilities and crime/security.
Consideration should also be given to incidents occurring not only in the computer room, but also in the building housing it and the area around that building. This enables those incidents that may prevent access to the building or the computer room to be taken into account.
From these loss scenarios the potential downtime for each one can be assessed. Downtime is based upon the potential severity of the loss, and will be used when considering recovery times and quantifying the impact of the loss.
This leads to the other factor normally considered when analyzing loss potentials--probability or likelihood. Probability can be assessed either quantitatively or qualitatively.
The quantitative approach uses statistical techniques to develop occurrence rates. There are several techniques, some of which are complex involving major data collection and calculation. The qualitative analysis expresses the probability in a descriptive manner by using such variables as low, medium or high. The approach is based on the assumption that precise information on hazard or loss data may not be available and therefore, a descriptive approach rather than a numeric approach has to be used in assessing the probability. Qualitative analysis may also involve a scale or range of values.
To quote Auerbach’s Data Security Management: “Quantifying risk presents some major difficulties. The most significant problem is that precise estimates of loss or threat occurrence are nearly impossible to derive without reliable empirical data. This limitation is especially prevalent in cases of fraud, human error and sabotage. As a result, many attempts to calculate computer risk have failed because the numbers are not credible.”
To avoid these problems, qualitative methods are used to develop an understanding of risk that emphasizes descriptions rather than calculations. This approach focuses management attention without forcing endless discussion of why a result was a particular value. The actual method chosen should be one that is acceptable to management in their decision making process, and capable of being supported by adequate data.
DETERMINE VITAL FUNCTIONS/APPLICATIONS
Vital functions/applications are usually those whose loss would have an impact on revenue. An impact on revenue results in a decrease in revenue or an increase in expenses. There can, however, be functions/applications which are considered vital because of such aspects as the need to meet regulatory requirements and to maintain the company’s credibility with the public.
DETERMINE EMERGENCY OPERATING METHODS
This segment is intended to determine how the vital functions/applications can be continued in an emergency. What has to continue online? What can be switched from online to batch? Can anything be handled manually? Answers to these questions, in the form of a list of emergency operating needs, will provide the starting point for the next segment of the analysis.
DETERMINE MINIMUM RESOURCES
The list of emergency operating needs enables an estimate to be made of the minimum resources that will be needed to maintain the emergency operations. The resource requirements include computer and office equipment, building space, software, data and staff. A comparison of these minimum resource requirements with any existing back-up or alternate operating capabilities will indicate what additional resources are needed. This is an initial overall review of the resource requirements. It will be refined and become more detailed in subsequent stages of the plan development process.
DETERMINE RECOVERY TIME REQUIREMENTS
Knowing the emergency operating requirements and the minimum resources enables the recovery time to be estimated. Two aspects are involved: the time to implement any available emergency operating procedures, and the time to repair and reinstate the original site. The initial recovery period will take into account any existing emergency operating procedures or disaster recovery plan. The long term recovery period will be based on the downtime assessment made when analyzing the loss potentials.
Recovery time determines the financial impact. Obviously the shorter the time, the lower the impact. This is true for both initial and long term recovery periods. The initial recovery period to implement emergency procedures may result in either a major impact because the company’s revenue producing operations cannot function, or a low impact because there is a carryover of the revenue producing capability.
The long term impact will be affected by how far the emergency operations can maintain the company’s revenue producing capabilities.
In quantifying the impact one must consider factors such as the loss of revenue, extra expense for emergency recovery, potential continued loss of market and the costs of full recovery.
A Risk analysis which answers the questions, “what will happen?” “What will be affected?” and “What will be the impact?” provides the information needed as a foundation for considering the alternative operating procedures that are available or which have to be developed, and for developing the plan document. Any plan that is developed without this information cannot be considered sound as decisions made during its development may have been based on erroneous or inadequate information.
Finally, the risk analysis also provides information for management’s commitment to Disaster Recovery Planning. Without that commitment, a plan may never be developed. If it is developed, it will be a long, drawn out project with a low priority.
(This article is based on Mr. Musson’s presentation to the Delaware Valley Disaster Recovery Information Exchange Group in April, 1988.)
This article adapted from Vol. 1 No. 4, p. 10.Printed In Fall 1998