|
DISASTER
RECOVERY
JOURNAL
Return
to the Fall 2001
Index
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
EDITOR
Michelle Saab
michelle@drj.com
COPY EDITORS
Edward H. Pearce, CBCP
drj@drj.com
Richard
Sandhofer
richards@drj.com
INTERNET /
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
ww.drms.com.br
|
|
Click
Here for a Printable Version
Business
Implications of Network Downtime and How to Mitigate Its Risk
by Edward Rabinovitch
It is difficult
to overestimate the importance of Information Technology today. If,
a few years ago, this statement was mainly applicable to large enterprises,
now it is certainly true for any business, regardless of its size or
industry. Information is power in the current economic climate. So,
having the right information available whenever its needed will
help businesses remain competitive in the marketplace. This increased
dependence on up-to-date information creates reliance on IT in general
and networking infrastructure in particular.
Highly available
and reliable infrastructure is required for secure information delivery.
Network performance means availability of information, and thus, business
performance. Reliability and fault tolerance are crucial for maintaining
continuous network operations. Planning for disaster recovery that allows
information availability under any circumstance is one of the necessary
tasks for business continuity in the short term and is a critical success
factor for any business over the much longer term.
Impact
of Downtime
Although the severity of the negative effects of downtime is quite obvious,
the business impact becomes more striking when we are trying to quantify
such effects. How do you measure, or rather estimate, downtime? Traditional
metrics mainly focus on transaction loss, which can be quite accurately
measured for transaction-oriented processes by quantifying the amount
of data lost and the scope of rework for data recovery. However, no
less important is taking into account the productivity loss. In todays
business world where most companies are dependent on computer systems
for their operations, unavailable systems and applications create sharp
productivity declines.
Also of growing importance are businesses customer support operations,
which more and more frequently depend on access to networked applications.
Therefore, unavailability of a customer support application will most
definitely lead to a slowdown in customer service and potentially disgruntled
customers -- the impact of which is quite difficult to quantify. It
is equally difficult to quantify the impact on business partners and
supply chain management.
It is important
to emphasize that total system unavailability is not the only danger.
Downtime of critical components causing slow response time can effect
a companys reputation, as customers will quickly look elsewhere
for their products and services. For example, with the increasing interdependence
between companies in a supply chain, a delay in scheduling may affect
not only direct clients, but also their customers and their customers
customers. Some of the most important but rather difficult factors to
quantify are the sales opportunities lost as a result of downtime.
The following table, put together by Contingency Planning Research Inc.
of Livingston, NJ, estimates the cost of downtime for different industry
sectors. (Figure 1 below)

Causes
of Downtime
One of the most common causes of downtime is probably change management,
or perhaps more to the point, making modifications without change management.
Lack of proper change management policies or noncompliance with change
management procedures oftentimes creates unwarranted downtime. While
inherent hardware or software defects are often blamed for network and
system failures, in reality, systems more often fail due to misconfiguration
or improper modifications as described above. Nevertheless, hardware
and software will on occasion fail, the timing of which is, in many
cases, unpredictable. Power outages have also seemingly become more
and more frequent, as we all witnessed over the last few months.
As stated
earlier, downtime is not the only cause of application unavailability;
slow response time may also result in poor and often unacceptable service
quality, and can sometimes be perceived as downtime.
Due to their
location in volatile climates or energy shortage areas, some companies
will be completely unable to predict downtime. Disasters will simply
force them out of business. Others will be able to maintain operations
because of the foresight to set up disaster recovery systems that back
up data and in some cases entire systems to remote locations.
And if proper
contingency plans can help to prevent or at least minimize the effects
of downtime described in the natural and manmade disaster cases above,
the only proper way to deal with downtime caused by such catastrophes
is a sound disaster recovery plan with off-site contingency provisioning.
Planning
for Business Continuity
Proper contingency planning for IT starts with identification of mission-critical
applications and related computing systems. During this process it is
very important to define the business impact of downtime. Make sure
to have well-defined and well-tested step-by-step backup and disaster
recovery plans. Such contingency plans should have provisioning for
data recovery and data access, as well as alternate locations and offices
for personnel. When looking for such locations, consider factors such
as ensuring security; routing phone and data access lines; and notifying
customers, postal services, distributors, suppliers, and (most importantly)
employees of the alternate locations. And, as mentioned above, contingency
procedures should not just be identified and planned, but also periodically
tested.
In a recent
survey of its 1318 members, TechRepublic of Louisville, KY. uncovered
quite a disturbing picture of business continuity readiness (or rather
the lack thereof). The following figure summarizes responses to their
survey: (Figure 2 below)

Most of those surveyed realized the severity of this situation and had
different levels of contingency planning in place. The following chart
summarizes responses on such measures in the near future. (Figure 3
below)

Case Study
The activities described in figures 2 and 3 are critical, but proper
business continuity planning requires a comprehensive disaster recovery
strategy focusing on each and every aspect of high-availability and
contingency planning. One way to better ensure that contingency procedures
are secure is to outsource to an experienced service provider with high-availability
infrastructure, policies and procedures. Qualified vendors offer technical
expertise and a physically removed backup center. HomeSource Capital
Mortgage Company took advantage of such a vendor. HomeSource, a mortgage
banker, is located in the heart of the hurricane belt in Jupiter, Florida.
One hundred percent of its business is reliant on next generation technologies
using both online and offline tools, making them vulnerable should they
experience network downtime. Even a few hours of outage of HomeSources
mission-critical applications could be devastating to its customers,
and potentially disastrous for its business.
HomeSource made the decision
to house its IT systems in a remote location to avoid downtime from
a major storm or other natural disaster. The mortgage banker selected
managed hosting and IT outsourcing services provider Cervalis to manage
its critical applications and keep its e-business safe. Cervalis
IDC, designed with extreme high-availability in mind (see Figure 4 below)
is located in Dutchess County, New York - a healthy distance from the
frequent rages of Mother Nature. For HomeSource, Cervalis is a safe
haven situated away from the hurricane hot spot of the Florida coastline.
Managed hosting providers with N+1 network redundancy and an advanced
degree of virtual and physical security offer similar shelter from the
hazards so many e-businesses face right now. Power outages, tornadoes,
forest fires, floods and hurricanes have jeopardized businesses all
over the country this year. But IT services that are managed and protected
from the elements by outsourced Internet Data Centers provide reliable
connectivity and availability to customers - so their businesses are
free to operate at full capacity.

Security Implications
Undeniably, system malfunctioning, or a manmade or natural disaster
does not always cause downtime. Breaches in security and deliberate
hacks, such as denial of service attacks, can essentially shut systems
down, as was recently demonstrated in a number of well-publicized cases.
Network security can be protected through a combination of high-availability
network architecture and an integrated set of security access control
and monitoring mechanisms. Recent well-publicized incidents of Distributed
Denial of Service (DDoS) attacks demonstrate the importance of monitoring
security and filtering not only incoming traffic, but also the outbound
traffic generated within the network. Defining a solid, up-to-date information
protection program, with associated access control policies and business
recovery procedures, should be the first priority on the agenda of every
networked organization. Specifically, a firms information security
posture - an assessment of the strength and effectiveness of the organizational
infrastructure in support of technical security controls - has to be
addressed through the following activities:
o Auditing network monitoring and incident response
o Communications management
o Configurations for critical systems: firewalls/air-gaps, DNS, policy
servers
o Configuration management practices
o External access requirements and dependencies
o Physical security controls
o Risk management practices
o Security awareness and training for all organization levels
o System maintenance
o System operation procedures and documentation
o Application development and controls
o Authentication controls
o Network architecture and access controls
o Network services and operational coordination
o Security technical policies, practices, and documentation
A sound business continuity
plan, including high-availability network design with comprehensive
security policies aimed at high availability, recoverability and data
integrity establishes the necessary infrastructure to conduct any activities
in a secure and reliable fashion, regardless of whether the public Internet,
extranets or intranets are being utilized.
Edward Rabinovitch
is Vice President of Network Engineering at Cervalis. He is an industry-wide
recognized specialist with more than twenty years of experience in information
and networking technology, data processing, Internet/intranet/extranet
and business communications.
Rabinovitch is a member of the editorial review boards and contributing
editor for the IEEE Communications Magazine, Enterprise Systems Journal
and The Computer Measurement Group.
*To comment on this
article, go to 1404-14 at www.drj.com/feedback.
©Copyright
2001 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
|