A disaster may be defined as an occurrence, natural or man-made, which results in great destruction and loss. In almost all of these cases, and many more, companies were left without power, computer rooms flooded, and networks destroyed. Even minor accidents can lead to disastrous consequences. A fire breaking out in an isolated portion of an office building, a broken water main, a bolt of lightening resulting in an electrical surge, and even sabotage can all be considered disasters if they prevent the organization from performing critical business functions for any length of time.
Recent disasters appear to have wreaked greater havoc than the events of the past. What may appear to be a similar disaster, such as an earthquake of similar intensity measured in Richter scale, is likely to cause not only more wide spread damage, but can also lead to significant and prolonged disruption in the conduct of business operations. This trend should not be a surprise due to the growing dependence on computers and communication technologies which, in turn, absolutely depend on the uninterrupted availability of all infrastructure elements associated with these advanced technologies. These technologies are utilized not just for keeping track of organizations' work, but also for doing the work. Most organizations today (for instance, most banks and financial institutions) will come to a grinding halt very quickly if computers and communications technologies are disabled due to a disaster. When the data and the processing necessary for vital business transactions are distributed throughout the organization and linked via communications network, as is the case in a client/server environment, organizations are vulnerable to disasters to a greater degree than in the mainframe environment, where major system components reside in a central data center managed.
The importance of business continuity planning (BCP) is not only recognized by organizations that have survived a disaster, but also in increasing numbers by others lucky enough to have escaped a tragedy thus far. They have witnessed competitors being forced to terminate business operations because of their inability to recover from a disaster in a sufficient amount of time. They have also seen those who managed to recover but were left so weakened that they were forced to permanently close within a few years.
How long an organization can survive without computers and communications services varies by industry and size. A study of companies in the manufacturing and distribution industries with annual sales in excess of $215 million revealed that a typical firm can lose over $100,000 after four days without use of their computer systems and over $1 million after ten days. Inability to recover within ten days will result in bankruptcy for almost half of these firms [Wong, Monaco, and Sellaro, 1994]. In the financial industry, federal law mandates a written and tested disaster recovery plan, and further mandates that the plan provides the strategy necessary to have all critical applications up and running again within 24 hours. Regardless of the existence or otherwise of laws and regulations governing the requirement for disaster recovery and business continuity plans, organizations must be prepared to ensure proactive crisis management based simply on the fundamental need and recognition of prudent care in safeguarding their assets, which include people and information.
What is a client/server environment, also referred to as client/server architecture? Surely, this approach to the application of computers and communications technologies has been gaining significant attention in the recent years. No organization of any size appears to be immune to this revolutionary approach to the development of information systems. Consider the following definition:
"Client/Server Architecture is an application design approach that results in the decomposition of an information system into a small number of server functions, executing on one or more hardware platforms, that provide commonly used services to a larger number of client functions, executing on one or more different but interconnected platforms, that perform more narrowly defined work in reliance on common services provided by the server functions."
The key components of this definition are shown in Table 1, along with their implications for disaster recovery and business continuity (DRBC) planning for client/server environments.
Table 1 shows that even though the migration to a client/server architecture provides several advantages over a centralized multi-user architecture, the same technological advantages may become significant disadvantages in the event of a disaster, if appropriate disaster recovery and business continuity plans are not made. As Table 2 shows, the cost of network downtime can be extremely high. As mission critical applications migrate to several network nodes in a client/server environment, network downtime can not only lead to erosion of productivity and profits in the short term, but also to loss of image and competitive advantages in the long run.
As pointed out earlier, the distributed nature of computing enabled through communications technologies, pose significant risks in organizations. Fortunately, a wide variety of optional solutions continue to be made available in the industry. These solutions may be classified into the "turn-key" and "install-and-go" categories, as described below and shown as a comparison of recovery options and recovery times in Table 3 (see below).
The "turn-key" options include an organization's own fully equipped, installed, and tested recovery facility, or a hot site for which an organization has a subscription-type contract with a vendor. The turn-key option is typically expensive. However, the benefits are numerous: equipment compatibility, ability to test the plan, and a shorter recovery time compared to other options. If mission-critical functions need to be recovered in a very short time period, one of the turn-key options may indeed be the strategic choice.
Turn-key options must be considered in cases where the recovery time objective (RTO) is so small that install-and-go options may lead to loss exposure much higher than the cost of a turn-key option. As shown in the line diagram on the top half of Table 3, an organization may wish to consider its own recovery site or contract with a hot site for recovery and business resumption if the loss of mission critical functions could lead to total loss exposure that is higher than the total cost of the selected turn-key option for recovery and business resumption.
Most mainframe-based legacy information systems call for turn-key options for a variety of reasons. First, these systems include mission-critical systems with short recovery time objectives, most organizations calling for less than 24 hours as RTO in today's environment characterized by heavy reliance on computers and communications. Second, most mainframe-based legacy information systems are not amenable to install-and-go options because of difficulties in ensuring equipment availability and compatibility. Third, and perhaps most important, mainframe-based systems may lead to significant delays installation and implementation if an install-and-go option were to be utilized. In other words, even if equipment were available, installing, testing, and implementing information systems may take longer than the specified RTO.
The "install-and-go" options include drop-shipment agreements and cold-site. Drop-shipment agreement involves prior arrangement with a vendor to deliver requested equipment within a contractually specified time period. It is preferable to select those vendors who carry inventory of equipment (for example, Newcourt Capital - LAN Continuity Services, located near the Dallas-Fort Worth Airport), rather than with those vendors who merely promise delivery but do not carry equipment inventory. Drop shipment programs are typically coupled with a cold-site option. Cold sites provide backup physical space with the required wiring, heating, air-conditioning, raised floors, telephone lines, and other infrastructure needs. If an organization has a drop-shipment agreement, the vendor can then deliver the equipment under contract to the cold site location. Obviously, the installation may take considerable amount of time. Furthermore, less-than effective drop shipment vendors may contribute further to recovery delays by not carrying critical equipment in inventory.
Client-server installations may well consider install-and-go option for recovery and resumption of those business functions which are characterized by criticality/priority considerations and recovery time objectives that permit organizations to consider less stringent recovery options compared to mission critical functions. Some of the major characteristics of client-server recovery include the following (assuming an off-site storage and retrieval program for vital data in all forms and locations, including servers and clients):
- Recovery of mainframe-based legacy information systems as enterprise-level super server;
- Recovery of one or more servers for each of the critical business functions;
- Recovery and resumption of business functions at each of the client-level
In most work area recovery situations subsequent to recovery of mainframe-based information systems, the recovery and resumption of servers and clients may call for an RTO of three to five days, contingent upon organizational needs. Nevertheless, a cost effective option for client-server environment recovery and resumption should include evaluation of install-and-go options, such as the combination of subscription to a drop-shipment program and a cold site for work area recovery. Note, however, that for work area recovery and resumption in a client-server environment, turn-key as well as install-and-go options are viable strategies to be considered by a disaster recovery and business continuity planner. The choice among the options must obviously consider the cost elements and the benefits associated with each of the options.
Table 4 on page 50 shows typical cost elements and benefits associated with optional solutions for client-server environment recovery and resumption and business continuity.
As organizations' dependence on computers and communications technologies grows at an alarmingly accelerated pace, so should their proactive preparations to mitigate effects of disasters on these technologies. While several options for recovering and resuming business operations are available today and continue to be developed in the industry, organizations and their disaster recovery/business continuity managers should exercise considerable care and prudence in selecting options which are cost-beneficial to the organization and its mission, objectives, and strategies. To ensure survival in the face of a disaster, however, an organization must have the appropriate portfolio of recovery and resumption options which will ensure that recovery time objectives for various mission-critical, vital, and important business functions are met at appropriate costs. This article has presented general guidelines in this regard, as well as specific guidelines for client/server computing environments, which are fast becoming common place in most, if not all, organizations.
Wong, Bo K., Monaco, John A., and Sellaro, C. Louise, "Disaster Recovery Planning: Suggestions to Top Management and Information Systems Managers," Journal of Systems Management, Vol. 45, No. 5, May 1994, p.28 (5).
Raja K. Iyer, CBCP, received his Ph.D. from the University of Minnesota and is currently the ERP/BCP Practice Manager with Sprint Paranet, and an Adjunct Professor of Information Systems in the College of Business Administration at the University of Texas at Arlington. Dr. Iyer currently serves as a member on the Certification Board of DRI International, and served as the Chair of the Education, Testing, and Standards Committee of DRI International.