|
DISASTER
RECOVERY
JOURNAL
Return
to the Spring 2001
Index
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
EDITOR
Michelle Saab
michelle@drj.com
COPY EDITORS
Edward H. Pearce, CBCP
drj@drj.com
Richard
Sandhofer
richards@drj.com
INTERNET /
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
ww.drms.com.br
|
|
Click
Here for a Printable Version
Disaster
Recovery: No Longer Enough
-
by Ron LaPedis, CBCP
In the business
world, computer disaster recovery planning is evolving toward business
continuity planning. In recognition of this trend, in 1995, DRI International,
an organization founded in 1988 to provide a base of common knowledge
in continuity planning, replaced the designation for Certified Disaster
Recovery Planner (CDRP) with Certified Business Continuity Planner (CBCP).
What is the difference between disaster recovery and continuity planning?
In theory, a disaster recovery plan is reactive and usually focuses
on the computing environment. Although work is done to harden the computing
infrastructure to prevent a disaster, the plans main purpose is
to recover from damage to the infrastructure. In contrast, a business
continuity or contingency plan is not only proactive, but it is also
targeted at keeping the business running, and not just recovering the
computers.
Many companies today do not have a working continuity plan. Of those
companies that do develop a continuity plan, many proceed without sufficient
knowledge or input from end users. For example, auditors or managers
often direct someone within the IT department to write a plan to back
up the companys data centers. Frequently, the IT operations staff
backs up everything running on a particular system, or even the entire
data center, so that all information, critical or not, is recovered
at the same time even if the business function the data supports
either is not critical or can be replaced by manual procedures.
Also, because end users are not involved in developing the continuity
plan, their manual procedures, physical facilities, hard-copy records,
and other special needs are often overlooked. Thus hardware and applications
are being recovered, but not the business processes that use them.
In a business continuity planning program, individual business functions
identify their critical business processes and develop separate (but
coordinated) continuity plans for each of them. The benefits of this
distributed approach are many:
-Business
processes that are not critical do not hinder those that are, so that
limited resources can be used effectively.
-Infrastructure that supports noncritical business processes does not
get recovered.
-Multiple critical business processes/applications can be recovered
in parallel.
-Applications that normally run on different systems can be recovered
on the same system, if necessary.
The survival
of your business after a disaster depends on having a continuity plan
in place. This article details the procedures for launching a continuity
planning program.
Beginning
the Continuity Planning Process
Where
do I start?
Suppose you
are beginning a corporate continuity planning process for the Absolutely
Best Company (ABC), a manufacturer of top-quality widgets (an imaginary
company and product). There are several steps involved, as detailed
in the following paragraphs.
Gain management commitment
For the continuity plan to be successful, management must be committed
at the highest level. The plan must be part of the strategic business
plan, and the company must budget appropriately and separately for the
continuity planning program.
Identify
critical business functions
Assuming that you have management support, the next step is to identify
how the company obtains its revenue in terms of business functions.
For example, ABC first takes orders for widgets and then builds the
widgets to meet those orders. Next, the widgets are installed, and the
customers are billed. Finally, the employees are paid so that the process
will continue. Other revenue comes from service and support, but because
those are distributed functions, an event that has an impact on the
corporate site should not affect them adversely.
Once you have defined the gross critical business functions (not the
infrastructure, such as computer applications, that supports them),
a risk assessment and business impact analysis should be performed for
each of the business functions and then, if appropriate, for the infrastructure
supporting them. Remember also to analyze dependencies because a business
function that appears noncritical could be supporting one that is critical.
Build
business process core teams
You start by building business process core teams consisting of information
technology (IT) operations management, end-user management, applications
support staff for each critical business function, and the records management
department. This team technique is called the Delphi method, and hence
the team is called the Delphi team. Through the Delphi teams, you develop
a clearer view of the infrastructure (for example, processes, records,
IT applications) the teams believe are critical to performing their
business functions.
Build
a corporate team
You should
also build a corporate team, consisting of members from the accounting,
auditing, information technology, facilities, human resources, legal,
public relations, investor relations, purchasing, postal services, records
management, risk management, safety, security, shipping/receiving, and
telecommunications departments. In a disaster, not only will these departments
be required to continue their support roles, but also they may have
to implement major infrastructure changes to support the affected areas.
Risk assessment and business impact analysis
The purpose of a risk assessment and business impact analysis is to
answer the following questions:
-What am I trying to protect?
-What am I trying to protect them from?
-What controls are currently in place or needed to prevent or minimize
the effects of potential loss?
-How much am I willing to spend on those controls?
-Is the money I am spending effective?
Thus the
risk assessment involves identifying threats, vulnerabilities, risks,
and the business impact of a disruption for each entity. Before you
begin the ranking process, determine what criteria to use. Generally,
they are split between quantitative and qualitative. Quantitative losses
can be expressed as a number, such as an annualized loss exposure (ALE).
To start the risk assessment, rank all of the entities whose loss could
negatively affect business, gain consensus from each Delphi team, and
then merge the results for presentation to and concurrence by upper
management.
Recovery
time and recovery point objectives
As part of
the risk assessment, the Delphi teams estimate how long an entity can
be unavailable, how old the information supplied by the entity can be,
and how much of it can reasonably be lost when it is made available
again. That is, they determine the recovery time objective (RTO) and
recovery point objective (RPO).
Recovery time objective refers to the time from when the event occurs
until the business process (for example, the accounting department,
the accounting application, or manual procedures used by accounting)
must become active again (recovered).
Recovery point objective describes the point in time to which the data
must be recovered stale (old or obsolete) information no longer
reflects the state of the company. This can also be thought of as the
freshness window. The teams consult with management to confirm their
decisions.
The longer the RTO for a particular entity, the less the cost will probably
be to recover it (figure 1). Unfortunately, at the same time, the losses
from the entity that is unavailable are escalating (figure 2).
RTO and RPO are not necessarily linked to each other; that is, a short
RTO does not imply a short RPO. For example, a database might need to
be recreated from backup tapes in less than a day (RTO), but while the
online data must be current, it is acceptable for the batch data to
be one week old (RPO).

Figure 1. Cost of recovery.

Figure 2. Loss due to the time an entity is unavailable.
Disaster
Tolerance:
Closing The Freshness Window
Real-time
technologies, such as online data replication or vaulting, can practically
close the freshness window. They are more expensive than routine backups
but could be deemed necessary in some situations, such as Web transactions,
wire transfers, and supply chain applications. Replication or vaulting
duplicates data onto an off-site location as it is manipulated on the
primary system. The semantic difference between these two terms usually
is that vaulted data is batched and/or stored offline and needs to be
moved to the backup hardware, while replicated data is sent in near
real time, possibly directly to the backup hardware, and is ready to
run.
If your company is already geographically dispersed and application
uptime is imperative, you can create application domains at more than
one site and distribute the load. When the load is being shared in this
manner, you do not actually have primary and secondary systems or sites.
What you do have are the beginnings of indestructible, scalable computing.
Alternative
plans and controls
Once risks are assessed and recovery windows are determined, the planner
can ask the Delphi teams to begin outlining possible continuity plans
for their business functions, starting with the most critical. The baseline
is the before risk or ALE (no plan or controls in place). For each alternative
plan or control, the Delphi teams need to calculate the after risk or
ALE (plan and controls in place) along with the cost of the plan and
controls.
At some point, the potential loss reduction (savings) will be less than
the expenditures required to develop and implement the continuity plan
(figure 3). Here is where the executive staff comes into play again.
They need to determine which controls and recovery window each plan
should address based on cost versus savings and time to deploy.

Figure 3. Plans I and III would break even (at different levels). Plan
II would generate savings in excess of its implementation cost, and
Plan IV would cost more than it would save.
Documentation and Standards
Required
documentation
Before developing a continuity plan identifying activities to be performed
during a disaster scenario, you need to understand how those same activities
work every day. This means that you need access to some basic documentation
for each business process. You also need to gather more specific information
about your companys business functions. For example, find out
if sufficient application downtime is scheduled to back up the databases
used by the business functions, or, if the software allows it, if the
databases are being backed up online. (If your company runs 24 x 7,
you dont have time to take your system down for backups.) Determine
if there is an archival process to remove inactive records from hard-copy
files and databases so that they are kept at a manageable size. Also,
identify where critical records are being stored: on site, off site,
out of the regional area.
If any of this information is missing, a continuity plan can be started,
but it may not be effective. Without knowing how the business functions
run and what is required for normal processing, no one can guess the
requirements for exception processing. Actually, the bulk of the work
towards developing a continuity plan lies in ensuring that standard
business practices are documented and followed.
Developing standards: A cookbook
The continuity planner either develops or purchases a standard set of
forms and procedures to be used by each function: a continuity plan
cookbook. Without standards, each business functions
plan will look different, making coordination difficult.
The continuity planner develops a standard set of forms and procedures
to be used by each function: a continuity plan cookbook.
Different practitioners divide and name the phases of the continuity
planning program differently. The Disaster Recovery Institute defines
seven phases of a continuity planning program as follows:
-Phase 1:
Project initiation
-Phase 2: Functional requirements
-Phase 3: Design and development
-Phase 4: Implementation
-Phase 5: Testing and exercise
-Phase 6: Maintenance and update
-Phase 7: Execution
Your cookbook
should be developed or software selected during phase 3 to be used by
the business function teams during phase 4. At a minimum, it should
contain the following:
-Step-by-step approach for each group to follow in writing the continuity
plan
-Corporate team description, stating which corporate resources will
be available to assist each business function in developing its plan
-Notification process: Although the planner maintains the first-level
response part of the plan, including lists of important phone numbers
at the corporate level, the security department probably should start
the actual notification process in the event of a disaster.
-Plan considerations: Identifies the issues that must be addressed by
business functions as they write their plans
-Responsibility list: A script or checklist, by job function, of what
each person will be required to do during the seven phases of plan executionevaluation
and declaration, notification, emergency response, interim processing,
salvage, relocation/reentry, and resumption of normal processing
Writing
the continuity plan
If employees within each business function write the plan for that function,
you achieve multiple goals:
- Employees know what the day-to-day activities are.
-Numerous functions can be generating plans at the same time.
-You gain buy-in of the business function employees.
A designated
continuity planner should be available to answer questions as each group
writes its continuity plan. Not only does each business function need
to generate a plan, but also each department represented on the corporate
team needs to have a plan, in case those departments are also affected
by the disaster.
Exercising
the plan
Continuity plans should not be testedthey should be exercised.
Tests are passed or failed, whereas exercises are conducted for practice.
Exercise the plans thoroughly to ensure that they work. During the exercise,
note any problems that occur and encourage feedback from participants.
The purpose of the exercises is to reveal any defective or missing components
in the plans. It is counterproductive to reprimand someone for pointing
out errors or omissions. Some companies spend hundreds of hours testing
and refining specific pieces of their plans until they are satisfied
with the time or accuracy of execution.
It is best to exercise and update the plans at least annually, or when
major changes occur. Update call lists quarterly. It is better to have
no plan at all than to have an out-of-date plansuch a plan lends
a false sense of security and wastes time during an actual disaster.
Conclusion
A continuity plan should not and cannot be written by the IT department
alone, nor should it be written solely for a given computer or data
center. Developing such a plan is a long-term process that requires
substantial human and monetary resources throughout the company. Without
a long-term commitment to continuity planning from the highest executive
levels, efforts to develop such a plan are bound to fail.
The planning process cannot even begin without documentation for everyday
processing already in place, including change control procedures, standard
operating procedures, run books, data flow diagrams, problem isolation
procedures, and a tape backup or rotation schedule. As a by-product,
continuity planning forces more formalized standard documentation across
the entire company, resulting in faster isolation of application bugs,
fewer operational mistakes, reduced support requirements, faster training
of new personnel, and easier maintenance and enhancement of current
applications.
Not only is a continuity plan required by many regulatory agencies,
it could also mean the survival of your company.
Ron LaPedis,
CBCP, has worked for Compaq for 20 years in various capacities, most
recently as a product manager for security and business continuity products.
©Copyright
2001 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
|