Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

A Systematic Approach to Continuous Operations

Written by  William J. Douglas, CBCP Thursday, 15 November 2007 21:17
A Systematic Approach to Continuous Operations by William J. Douglas, CBCP As the information revolution evolves, corporate executives are becoming more concerned with ensuring their operations are available when their customers want them. Many believe that technology performance will be the key success factor for businesses in the future. This implies that organizations may need to consider production environments that have no single points of failure, that react instantaneously to interruptions in service. In short, the successful businesses must target a technology environment which can be relied on 24 hours a day, 7 days a week, 52 weeks a year. But what if that interruption is a catastrophic disaster? How is continuous operations defined in the event of a disaster and what is the impact on the recovery environment? The appropriate level of continuous operations must be driven by a Business Impact Analysis (BIA) and the resulting dollar risk versus recovery cost decision.

In the last decade, there have been numerous local and regional disasters that have affected business in the United States. The impact on the business community has been significant. Statistically, it has been reported that 43% of the companies experiencing a disaster never reopen, and an additional 29% close within two years. 75% of businesses who lose computer support are no longer able to conduct business functions after only 2 weeks. This is not an acceptable situation.

Before a business can determine the most appropriate recovery environment, executives must conduct a Business Impact Analysis (BIA) to determine the real hard dollar loss it would expect to incur should they be unable to conduct business over a period of time. Customer confidence is critical and must be factored into the analysis, but since this is subjective and not quantitative, the recovery requirements should not be based solely on this analysis. Regulatory impacts can also be a major driver for many businesses and must be factored into the analysis. The impact to the bottom line has got to be the ultimate driver. Using the information gathered in BIA process, executives can weigh the cost of recovery against the risk of a disaster and the impact on the survival of the business. The Business Units must determine the real hard dollar loss over time during an extended outage with the BIA. It is important that the Business Units determine this impact not the Technology Support areas and that the full spectrum of recovery resources be identified and considered not just the traditional data center items. The BIA will identify the major business functions performed as well as the cost and impact over time if the business functions are not recovered. The Business Units are the ones who are ultimately responsible and eventually pay for the recovery environment.

The cost of the recovery solution will be directly related to how quickly the business must be restored and how much intra-day data needs to be protected. There are three key recovery objectives which must be considered when determining the most appropriate level of continuous operations for any business or system:
Recovery Time Objective (RTO) - The amount of time that elapses between the 'event' and the time the operation is restored.
Recovery Point Objective (RPO) - The currency of the data at the time of recovery in relation to the 'event' (i.e. hourly backups, daily backups).
Recovery Communication Objective (RCO) - The amount of time for the restoration of the voice and data communication following the 'event'.

These objectives are independent of each other and may be different for the various business functions performed by the Business Units. These objectives directly relate to the strategy used to recover the Business Unit's functions. Based on the results of the BIA and specifically the hard dollar loss over time, the recovery objectives can be clearly determined.

The following issues must be addressed in conjunction with the BIA, as they will have direct impact on the recovery objectives:

  • The current off-site storage and recovery procedures must be documented. Management must understand how often critical data is backed up and when it is stored off-site. Personal Computer data, paper files and 'work in progress' on the desk should be considered as well the data on servers and centralized computers. If a disaster occurs, what is the state of the data that is available off-site and how can the transactional data since the last backup be recovered? Considering the time to collect the backed up information and the transportation to the recovery location(s), what is the projected time to restore the data and prepare to perform 'catch up processing'? If the business is dependent on electronic access to the data, then the telecommunication connectivity from business offices to the recovery location(s) must be documented. This analysis will help define the Recovery Communication Objective (RCO). The backup and recovery process may be acceptable, but if the end users or customers cannot get to the data, the recovery process has failed.
  • The Business Units must risk rank applications into high, medium, and low priorities. Again, the business units understand what technology and other components are critical to their operations. Part of the cost analysis must balance the recovery time with the core business functions. Recovery Time Objectives (RTO) can be established for each priority level as well as the amount of intra-day data that must be protected or recovered (the Recovery Point Objective or RPO.). For example, the internal management information may not be needed immediately following a disaster whereas the order entry system that generates the majority of the revenue stream may be the critical link.
  • Application developers must review the dependencies/interactions between applications. Once the Business Units have prioritized the applications, application analyst's need to make sure that a lower level priority application is not a required input to a higher level application. This analysis could totally change the mix of required resources needed to complete the recovery process.
  • Technology associates must review the data storage (DASD and TAPE) and the CPU processing capacity requirements for each priority level. Once it is known what has to be recovered, then the technology platform at the recovery system must be sized to meet this requirement. Often it is possible to reduce the cost of the recovery environment by using somewhat smaller systems which will still meet the minimal recovery needs for the business.


With the BIA and the recovery objectives determined, Business Executives can address the level of 'continuous operations' they must achieve in the event of the disaster. Continuous Operations translates into 'Availability'. Business recovery requirements are directly related to 'Availability'. Depending on the results of a BIA and the Recovery Objectives, Continuous Operations can be redefined into one of the following availability options:
Continuous Availability - recovery within seconds or minutes of the event. This option implies full redundancy at the recovery location and the RTO, RPO, and RCO are immediate with minimum loss of data. The recovery location is a 'mirror' of the production location and is periodically switched with production to insure viability.

High Availability - recovery of the operation within 24 hours of the event. The RTO and RCO is less than 24 hours but the RPO is at the time of the 'event' with minimum loss of data. This level of availability relies on some level of advanced recovery technology such as electronic vaulting (electronic data backup to remote storage versus the usual tape trucked off-site method), electronic journaling (transmit of data input since the last backup to remote storage), and stand-by systems. The network is a combination of dedicated and on-demand resources and the recovery environment is tested at least annually. Backup staffing and/or work space is pre-arranged. Extended Availability - recovery of the operation is greater than 24 hours but less than 3 to 5 days. Again, the RTO and RCO is similarly defined as greater than 24 hours but less than 3 to 5 days, but the RPO is as of the last available backup. The network is primarily using on-demand services. This is a traditional 'Hot Site' recovery strategy which is periodically tested. There will be some loss of data unless manual procedures are in place to protect the information source to facilitate re-entry of information after recovery. Backup staffing and work space may be subscribed for, done by a 'work at home' solution, or handled using other company locations. Basic Availability - the recovery plan is documented but may or may not have been tested. The organization may have vendor contracts for a Hot Site Agreement, a Quick Ship Agreement, and/or a Work Area Agreement. The RCO, RPO, and RTO may vary, depending on the nature of the business function, but is assumed to be greater than 3 to 5 days.

These availability options apply to all recovery strategies. When discussing business continuity with executives these options apply to System Recovery (mainframe, midrange, client server, LAN/WAN, etc.), Communications (voice, E-mail, Internet, etc.), Help Desk, Call Centers, Work Area (facilities, desks, chairs, etc.) and a multitude of others (personal computers, printers, fax, people, forms and supplies, etc.).

Cost of recovery solutions rise exponentially as recovery times and points are shortened. Recovery solutions and levels of continuous operations must be matched to the RTO, RPO, and RCO. The level of risk aversion in any company will drive out where the level of risk is balanced by the expenditure to ensure some level of continuous operations. The key issue is that by using the BIA, the recovery solution can be arrived at systematically with everyone's eyes open to the level of risk that is being accepted.

The systematic nature of the above approach is appealing to executive management and the business areas because information is gathered from those that understand the risks and resulting recovery strategies are arrived at in an informed environment. Costs are justified and risks are understood by all involved. Those who believe that disasters cannot happen to them can understand what would happen if it did and the discussions on solutions can focus on risks versus cost.

The BIA approach and the associated continuous operations strategies are conceptually straight forward and thus easier to sell to management. Using this common terminology and approach for recovery objectives and determination of levels of availability, we can bridge the gap of understanding of what continuous operations means to a technology associate or a business executive.

In the final analysis, executive management will realize that in a continuous operations environment, business continuity is not a one time project event but the beginning of an ongoing process. As the business develops and expands, the impact on business continuity must be continually reviewed.

 

 



William J. Douglas, CBCP, Sr. Vice President, NationsBanc Services, Inc. Mr. Douglas is responsible for the technology component of Business Continuity for NationsBank Corporation. He is a member of the IBM Business Recovery Services Customer Advisory Board and co-chairs the Continuous Operations Subcommittee.

Login to post comments