Spring World 2018

Conference & Exhibit

Attend The #1 BC/DR Event!

Fall Journal

Volume 30, Issue 3

Full Contents Now Available!

One area of great importance to disaster recovery planners that has received very little attention is the question of how much to spend on the disaster recovery planning (DRP) effort. Through application of a “worst case” risk analysis process corporate officers can be effectively “sold” on the need for effective DRP, but what guidelines can be utilized in determining the amount of corporate resources that should be devoted to recoverability?

An excellent example of how to make such a determination is available in the insurance industry. Actuarial science is a field specifically directed at determining a reasonable price (including profit) for risk specific coverage. Actuaries use event frequency statistics to determine insurance rate. While it is unreasonable to expect DR planners to become proficient actuaries, we can certainly use actuarial methods in efforts to decide how much money should be spent on DRP.

An extension of simple “worst case” risk analysis methods can yield estimates of a company’s probable annual loss due to specific risk factors. Once an accurate picture of probable annual loss is developed, that loss figure can be utilized as a budgetary guidance tool.

Let’s look at a simplified example. Suppose X corporation has an IBM based host DP facility that supports their management of manufacturing operations. An impact analysis has demonstrated that in the absence of an effective DRP, X corporation will stand to lose $100 M if this data center is destroyed by fire. The DR Coordinator obtains information from his insurance carrier which indicates that a facility configured like X corporation’s data center can expect to experience a total loss with fire in 300 years. Assistance in determining event frequency can be obtained from insurance carriers and governmental agencies.

We now know how often the event (fire) is likely to occur, an what its impact will be. From this information we can estimate a level of probable annualized loss due to a totally destructive fire by utilizing the formula:

Annual Loss Exposure (ALE) = impact x frequency

We can express the frequency of once every 300 years as the ratio 1/300. So our formula yields:

ALE = $100,000,000 x 1/300 or
ALE = $100,000,000/300 = $333,333

This calculation tells us that the X corporation has an annual loss exposure of $333,333 due to totally destructive fire at the data center in question. In order to determine a total ALE of operation, take the factor ALE’s for all other risk factors that we wish to consider (earthquake, tornado, employee sabotage, etc.), and total them. Once our total ALE figure is determined, it can reasonably be used to guide budgetary decision making. It is simply bad business to spend more on DRP than you are “losing” on an annual basis.

Any competent risk analysis will include calculation of ALE. In fact, it is a cornerstone of the risk analysis method recommended by the National Bureau of Standards (NBS) in FIPS Publication 65. The NBS methodology presents a simplified method to ALE estimation which utilizes indexed tables. This method was originally developed by Robert H. Courtney Jr. of IBM, who gave permission to NBS to adapt the method to their needs. While this indexed table method does not yield ALE estimates quite as accurate as individual calculation, it is a viable way to obtain ALE figures that can be used as a broad budget guidance tool. FIPS Publication 65 and other NBS guidelines pertinent to DR and data security can be obtained from the National Technical Information Service:

National Technical Information Service
5285 Port Royal Road, Springfield, VA
22161, NTIS information (703) 487-4600

Regardless of the calculation method used, as the number of event types under consideration increases so does the volume of calculations to be performed. A PC based spreadsheet package can be an invaluable aid in these calculations. In addition, there are an increasing number of risk analysis consultants available to assist you. In any event, it pays to be an informed consumer when buying such services, and it certainly pays to have a financial yardstick available when cost analyzing your DRP alternatives.


Andrew M. Munro is a Disaster Recovery Planner with MCI Communications.

This article adapted from Vol. 2 No. 2, p. 45.

It is good news that many organizations are jumping on the disaster recovery bandwagon. Information security and disaster recovery practitioners have clearly scored some impressive successes. Management has become more aware of the need and has begun to allocate funds for security measures that we all knew to be important but found more difficult to sell in the past.

Disaster recovery is clearly an important means of containing loss when a disaster occurs. The key phrase here is “containing loss.” In any disaster, there will be substantial losses, no matter how carefully conceived and implemented the disaster recovery plan and disaster preparedness are.

Despite the increased comfort level we can enjoy with a carefully conceived and implemented contingency plan, something is missing. The barriers to loss are still incomplete. Contingency plans are effective weapons against unmitigated loss from a disaster, but they do absolutely nothing to prevent the disaster from happening. There are also many lesser threats that do not become disasters for which a typical disaster recovery plan is relevant. Misuse/abuse, fraud, theft of data, and data sabotage are only a few of the threats that fall into this “non-disastrous” yet potentially very costly category.

It is unquestionably worthwhile to have a tried and trusted disaster recovery plan in place. We get a warm, fuzzy feeling of security when we conduct successful disaster recovery plan tests and disaster scenarios. We are thus better prepared to cope with the real thing when it happens. But everyone hopes never to have to deal with a real disaster, and that warm, fuzzy feeling obscures the reality of potential losses that will still be incurred. Management is often particularly vulnerable to a false sense of security, especially when it has just spent tens to hundreds of thousands of dollars on disaster recovery planning--with ongoing costs of the same magnitude to keep the plan viable.

A real disaster will be costly in terms of denial-of-use (however well it is limited by the disaster recovery plan), disruption, destruction and human impact, no matter how well prepared we are. Therefore, it is clear that more should be done.

The missing link should be set in place to form a unified barrier to risk.

The missing link is Integrated Risk Management, as viewed from the information security perspective including all organizational and functional activities and controls that serve to assure the availability, integrity and confidentiality of information. Risk management is a familiar term in the insurance industry, but that definition is inadequate for the purposes of the information security practitioner and his interest in “managing” risk.

For information security purposes, risk management is the multifaceted process that includes the following:

Identifying risks

  • What can happen (threat occurrence)
  • How bad will it be if it happens (consequences)
  • How often will it happen (frequency)
  • How certain the answers are to these questions (uncertainty)

Identifying vulnerabilities that increase risk exposure by allowing threats to occur with greater frequency, greater consequences, or both

Identifying cost-effective safeguards that serve to mitigate or eliminate vulnerabilities and reduce associated risk

This risk reduction is best achieved by first executing a credible risk assessment. The risk assessment supports risk avoidance/acceptance decision-making, i.e. risk management, by identifying probable loss exposures associated with the threats for which there are vulnerabilities at the target site. The complete risk assessment will also include recommendations for safeguards that cost-effectively reduce these loss exposures. The emerging concept of risk management may thus be represented as an organizational integration or coordination of classic risk management (insurance), physical security, data security and disaster recovery that enables a coherent orchestration of these often unconnected activities and their common goal of managing risk.

To make decisions whether to avoid, minimize or accept risk, management must know what the risks are, what their probable consequences (losses) are, what the vulnerabilities to risk are, and what steps can be taken to cost-effectively avoid or minimize risk. Note that risk acceptance is a legitimate management prerogative.

However, risk acceptance through ignorance of the facts has never been an acceptable excuse to executive management, the board, shareholders or constituents. The worst-case result of uninformed risk acceptance in the past has often been an unplanned and abrupt change in responsibilities. In the future, however, we will almost certainly see the Foreign Corrupt Practices Act of 1977 invoked when risks are accepted through ignorance and some substantial loss is suffered.

There is a trend toward greater government interest in the security of information in both the public and private sectors. This trend, as manifest in BC-177 (Disaster Recovery Requirements from the Controller of the Currency for the banking industry), OCC 220 and OCC 229, among other directives and regulations, is driven by a recognition that information processing is often critical to the successful pursuit of American business interests. The Foreign Corrupt Practices Act imposes significant penalties (felony fines and imprisonment) in the prosecution of both responsible management and the company which fail to maintain effective control over resources to the detriment of an organization and its shareholders.

While there are various ways to manage risk, the most effective approach to an Integrated Risk Management program is to establish and maintain a probabilistic risk model of the information processing environment in its broadest context. One of the best and most cost-effective tools for building, analyzing and maintaining a risk model is an automated probable risk assessment system.

Probable risk assessment does not presume to dictate whether management should avoid, minimize or accept risk. It does, however, provide management with reliable decision support information based on a defensible and substantially objective quantification of risk as opposed to a subjective qualitative ranking of risk. Therefore, with an effective Integrated Risk Management program, the information security and disaster recovery practitioner (the “risk manager”) can help management assure that risks (especially avoidable risks that could later result in disasters or other costly experiences) are not accepted through ignorance of the facts.

Yes, the contingency plan may very well “contain” losses arising from risks accepted ignorantly. But what if the disaster could--and should--have been avoided?


Will Ozier is President of Ozier, Perry & Associates.

This article adapted from Vol. 3 No. 1, p. 40.

Post-incident review (PIR) is an evaluation of incident response used to identify and correct weaknesses, as well as determine strengths and promulgate them. PIRs are normally used to support program revision. Despite its importance, PIR is one of the most neglected components of disaster recovery planning.

Imagine you have just survived a natural disaster. After weeks of intense response and recovery efforts, fortunately you are still in business. You’re exhausted and glad it is over. But a critical task awaits. Now, while your memory is fresh, is the time to learn from what happened and use the lessons to enhance your program and plans; don’t assume they will be remembered. All too often, managers fall into the common trap of waiting until later and losing the opportunity. This is the moment to exploit your boss’s fear that this could happen again in order to get the support you need. The organizations best equipped to survive and thrive are those that mature beyond the normal reflex of respond, recover and continue.

Applying hard learned lessons to a total disaster management program just makes sense. Better yet, go beyond disaster management, with its site specific focus, to crisis management and look at the bigger strategic picture. There are several things you should ask yourself:

  • What can be learned from what happened?
  • How do you avoid repeating mistakes?
  • How do you assess what is and is not working?
  • What are the implications of what just happened not only on you, but on your whole corporation or industry?
  • Are program and plan revisions needed?
  • How do these questions get answered? The best way to answer these and more is to conduct a post-incident review. Here is how the process works.

The post-incident review process begins with determining who will conduct the PIR. An effective review depends heavily on the objectivity of the review team. For that reason, you should select a team of individuals that are not part of your local organization, or, if from your site, were not involved with the response to or management of the incident. (The responders and managers will have an opportunity to provide their input later in the process.) The team should provide expertise in management, human factors, communications, planning and training. The team should include specialists that are technical experts in particular areas of concern for the specific incident. Specialty areas may include disaster response and management, fire, hazardous materials, environmental impacts and regulations or hostage situations. Several members of the team should also have strong interpersonal skills to facilitate capturing information through discussions and interviews with incident managers and responders. The team should have access to an advisory group of managers and senior leadership from within the organization that experienced the incident. These advisors help guide the activities of the team toward the philosophy of the organization. Their direct experience also assists with the assessment of how management responded to the incident and what long term effects have occurred as a result of their actions or the incident itself.

Once the team is assembled, its first step is to determine goals and objectives. What do we want to get out of this effort? A primary objective is to learn from what happened so your disaster management, response and recovery programs can be enhanced. Clearly defining the areas that the team will analyze should enable the team to make specific recommendations for improvement. Key areas of consideration include:

  • Mobilization procedures for personnel and equipment;
  • Implementation plans and procedures;
  • Management and coordination of emergency response;
  • Stakeholder reaction;
  • Internal and external communications;
  • Post-incident perception; and
  • The short and long term consequences of the incident.

Based upon the objectives and areas of consideration, review questions are developed. These questions will, among other things, seek to explore each important aspect of the incident. They should be applied to each available source of information on the incident; plans, procedures, records and participants (through interviews). While the questions are being developed, another part of the team will begin a records review to build a list of incident participants.

The next step is to conduct interviews. During interviews everyone involved with the actual response, management, or recovery effort should be provided the opportunity to supply input. No one person can see, hear, or know everything that happened. Often it is not practical to interview everyone, however, it is necessary to ensure an adequate cross section of those involved with the incident is covered. During the interview process it is important to obtain a series of important pieces of the puzzle.

The first piece is the basic, “What happened?” This information is used to build a time line of participants’ actions separate from those found in incident records. Another piece is the cause of the incident. Often, participants can provide valuable insight into why the incident occurred and what might be done to prevent it from happening again.

The short and long term consequences of the incident are another piece of the puzzle that can be obtained through the interview process with assistance provided by management. Participants can also impart the reactions and post-incident perceptions of the community and other organizational stakeholders. The participants’ perception of the strengths and weaknesses of the actions of the organization should also be documented.

Concurrent with the interviews, portions of the team will begin to analyze the implementation plans and procedures while other portions continue an in-depth records review. The records and plans review efforts will also develop time lines of what happened and what should have happened.

These documents are further surveyed to reveal strengths, weaknesses, and concerns based upon organizational standards and the disaster recovery and crisis management expertise of the reviewers. These portions of the team should develop checklists from the review questions used by the interviewers. Using a checklist with a comprehensive description of each area of consideration during plans analysis and record reviews helps keep these parts of the PIR objective and complete.

During the review phase, it is important to begin looking at the values and rationale that were applied during the planning process and by managers and responders in reaching decisions concerning response and recovery operations. This is especially important if it appears that deviations from the organization values occurred and if that variance had a direct effect on the response and recovery operations.

After the records review, plans analysis and interviews are completed, the team reconvenes to discuss and analyze their findings and develop a post-incident review report. Time lines developed by each group should be evaluated to identify points of deviation and convergence. Checking areas of divergence closely to determine where the plan was not followed will help identify candidate areas for planning or training enhancements. The individual perceptions of strengths, weaknesses, and concerns will be compared with the impressions and findings of the team’s record review. The team should emerge with a clear picture of what happened, what should have happened, and what should happen next. The picture is then assembled into a report of the post-incident review. A PIR report does not have to follow any special format and should only be as detailed as necessary to be a useful tool for crisis, disaster, and emergency planners and managers. The report should include recommendations for program enhancement or other modifications. It should address the following items:

  • A consolidated event time line;
  • Incident cause and recommendations for future correction or prevention;
  • Mobilization process, including notification of personnel and activation of facilities (this is particularly important in reviewing the time required to respond to an incident involving hazardous materials that could pose a threat to the surrounding community);
  • Prevention, mitigation and response equipment performance and procedures;
  • Implementation and performance of disaster response and crisis management plans and procedures including strengths, weaknesses, and concerns;
  • Management and coordination of disaster response and crisis management actions of those involved in responding to the incident;
  • Community and other stockholder reactions, especially any actions initiated by community emergency managers to protect its citizens;
  • Post-incident perception of organization performance, as revealed during interviews, in press reports, by changes in stock price, by investor reactions, etc.;
  • Company, corporation, or industry consequences, especially if alternative technologies are available;
  • Key “lessons learned” listed separately, to facilitate the implementation of enhancements that may be required.

Based on the PIR, the disaster recovery and crisis management programs should be revised to improve future performance. This could lead to revisions in several areas:

  • If the incident had not been previously identified as a potential hazard or vulnerability in the disaster and crisis plans then it should be added, and the hazard and vulnerability analysis should be reviewed;
  • If the report revealed weaknesses or gaps in the organization, the disaster response and/or crisis management structure should be modified;
  • If the policies and procedures did not address issues that became important during the incident, policies and procedures would need to be developed for those areas;
  • If response went poorly due to a lack of training, exercising or planning, these areas should be enhanced or modified and personnel should be familiarized with the changes; and
  • In areas where participants diverged from their existing plans and response or management operations went especially well, the disaster response and/or crisis management plans should be modified to reflect the reality of success.

The post-incident review process clearly provides an opportunity to learn from disasters and crises. Applying lessons learned to your disaster and crisis management program allows you to bring your procedures into focus with reality, and more importantly, it enables you to use the incident as a means of improving your program to better prepare for future situations.

While we never hope for another disaster, if one should occur again, your response, management and recovery operations should be smoother and more successful due to your post-incident review efforts.

By remembering the past, reinforcing strengths and enacting enhancements, we will heed the warnings and not be condemned to repeat history.


Mark Morgan is a Senior Associate with the Corporate Response Group, Inc. in Washington D.C.

In today's competitive environment, a business must achieve continual improvement just to stay even in the market place. Any interruption in one's presence in the market place is devastating. It is, therefore, incumbent upon management to respond immediately to any catastrophic event which interrupts the business and restore its operation as quickly as possible.

Subsequent to a catastrophe, many executives become distracted by the challenge of getting the building and equipment repairs completed rather than continuing their business function. This distraction may be challenging, but it is deadly. Businesses, large or small, begin dying the moment a catastrophe occurs. Restoration of business must proceed at the highest level emergency. After a serious catastrophe at a BASF Corporation facility, director of insurance Karl Heinz Jaeger, stated, “Business interruption losses can be a major threat to a company and in the worst cases could lead to bankruptcy for even the biggest of companies.”

Focus on the Customer

Customers, be they retail, wholesale, or service-oriented, must continue their supply from some source. Even if the damaged business can maintain a continued supply by virtue of partial operations, the customers feel it necessary to look for secondary sources of supply in case their now-damaged primary source of supply fail. If supply is interrupted, these customers must go elsewhere immediately, and their orders may be difficult to regain.

Beware of Hidden Costs

In addition to the strong potential for loss of business, there are other hidden, and often uninsurable costs which combine to create a devastating effect on the business. These hidden costs begin accumulating immediately after the disaster occurs. Some of these costs include:

  • Vastly increased unemployment compensation premiums resulting from the layoffs in the work force.
  • Substantial increases in advertising and special promotions expenditures necessary to rebuild the volume of business.
  • Often underestimated and significant cost of training new employees or eliminating the “rust” from old employees who have been idle for a period of time.
  • Increased production mistakes inherent in a restart with new or rusty former employees.
  • Overall lowered level of efficiency in the operation which adds significantly to the cost of production.

These hidden costs may sound innocuous; however, they are deadly in 71% of catastrophes which produce a “temporary” facility closure.

Even when the damaged business regains its pre-catastrophe volume, generally there will be a significantly reduced profit. In a worst case scenario, after a catastrophe there will be a net loss where that same volume during the pre-catastrophe period would have resulted in a reasonable profit. This is due to the combined effect of the hidden losses which accounting systems are generally not set up to track. Consequently, the business person is often unaware of the problems which are causing cash flow difficulty.

These circumstances contribute to statistics cited by BASF/Wyandotte which show that 43% of businesses closed by a catastrophe never reopen. Twenty-eight percent of those that do reopen, experience financial failure within three to five years. Those that never reopen simply do not have the financial resources to weather the period of time they are closed due to the catastrophe.

These numbers include those which are well insured because many of the hidden costs are not insurable expenses. Those that are insurable are often under-insured due to underestimating the maximum foreseeable loss. Clearly, immediate action must be taken if a business is to have any chance of recovery.

Act Immediately

After a catastrophe, the insured should immediately concentrate on the health and continuation of the business. Sales staff should contact customers, thank them for their past loyalty, and assure them an aggressive effort is being taken to restore the business and, therefore, the supply. Appropriate management staff should have immediate and frequent communications with the employees so they are available when the business reopens. Accounting staff should follow through on collections, billings, payables, and vendor communications. Furthermore, management should focus on locating additional inventory, preparing reopening advertising, and developing new promotions to restore the business.

The restoration of a facility should be left to professionals capable of doing so at a high rate of speed, while working closely with the insurance provider. It should be obvious by now that the fastest restoration of the facility and equipment is crucial for a business unable to relocate.

Utilizing a team approach, with the insured focusing on the continuation of the business, a reputable high-speed specialist restoring the building and equipment, and rapid funding of the restoration by the insurer, the facility should be back into operation in the least amount of time. Anything which shows the process can be devastating for the business.

Conclusion

Other alternatives that take additional time will, with rare exception, prove to be devastating to the business regardless of advantages they may appear to have.


Nelson Bean is president of The Evans American Corporation, Houston, Texas.

Effective contingency planning and disaster recovery coordination require expertise in all aspects of disaster management, including avoidance and recovery. It is too late to plan an effective response after a disaster has struck and significant downtime has been incurred. The resulting outage from such a disaster can have serious effects on the viability of a firm's operations, profitability, quality of service, and convenience. In fact, these consequences may be more severe because of the lost time that results from inadequate planning. After such an event, it is typical for senior management to become concerned with all aspects of the occurrence, including the measures taken to limit losses. Their concerns range from the initiating event, and contributing factors, to the response plans, ffective contingency planning and disaster recovery coordination require expertise in all aspects of disaster management, including avoidance and recovery. It is too late to plan an effective response after a disaster has struck and significant downtime has been incurred. The resulting outage from such a disaster can have serious effects on the viability of a firm's operations, profitability, quality of service, and convenience. In fact, these consequences may be more severe because of the lost time that results from inadequate planning.

After such an event, it is typical for senior management to become concerned with all aspects of the occurrence, including the measures taken to limit losses. Their concerns range from the initiating event, and contributing factors, to the response plans, equipment, training, and recovery operations used to counter it. Rather than delegate disaster avoidance to the facilities or building security organizations, it is preferable for a firm's disaster recovery planner(s) to understand fully the risks to operations and the measures that can minimize the probabilities and consequences, and to formulate their disaster recovery plan accordingly.