| DISASTER
RECOVERY
JOURNAL
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
MANAGING EDITOR
Jon Seals
jon@drj.com
COPY EDITORS
Richard Sandhofer
richards@drj.com
Pamela Clifton
pamelaclifton@hotmail.com
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
EXECUTIVE
COUNCIL
Jeff Dato, MBCP, KPMG
John Jackson, J Albright Advisors
Edward Devlin, E.S. Devlin & Associates
James Hammill, CBCP, JMH Consulting
Pat McAnally, SunGard Availability
Brian Turley, Strohl Systems
Belinda Wilson, Hewlett-Packard
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
www.drms.com.br
|
|
Click
Here for a Printable Version
Debunking Common Myths of High
Availability
By MATT FAIRBANKS
Many are the myths and misunderstandings that surround
high availability. Typically, they appear as variations on four basic
misconceptions:
- It’s costly
- It’s complicated
- It’s hard to measure
- It’s hard to test
These false impressions persist partly because hardware vendors continue
to aggressively press their own agendas. Erroneous messaging has led
many IT managers to believe that clustering applies only to mission-critical
applications. The truth is that if your organization cannot tolerate
even minutes of planned or unplanned downtime for its business-critical
applications, there are powerful high availability solutions within
economic reach. They mitigate against the primary threats to the continuity
of your business: human error, application failure, computer failures,
routine or scheduled maintenance, or complete site outages.
Before we get down to debunking our four myths, let’s step back
and take a high-level look at high availability.
What Are the Threats to Availability?
If high availability is your goal, what downtime threats do you need
to protect against? There are many, but we can consolidate them under
six categories:
- Data Corruption: The most prevalent threat to service and data
availability. A file gets deleted or corrupted, and services need
to be taken offline to correct the problem. The most common defense
is backup or frequent snapshots.
- Component Failure: Hardware malfunctions are common – a network
card, an array, a disk drive. There are fewer and fewer failures as
vendors build more resiliency into their products, but hardware failure
still remains a significant threat to availability.
- Application Failure: Downtime associated with unavailable applications
are common. They inevitably lead to loss of productivity and revenue.
- Human Error: When processes remain dependent on human intervention,
this interaction can introduce a multitude of errors that quickly
lead to lost availability.
These four threats are all unplanned, and they force IT departments
into reactive mode. The fifth threat to availability is a proactive
measure that is viewed as a necessary evil – planned downtime.
- Maintenance: This is by far the greatest contributor to downtime
in any environment. According to a leading analyst firm, 80 percent
of downtime is planned – server upgrades, application upgrades,
OS upgrades, and other site maintenance processes.
And even if you’ve done a good job protecting against these
five threats, there’s still a sixth threat to consider.
- Site Outage: Any high availability solution must include protection
against the total loss of your site in the event of a disaster such
as a fire or a flood.
Software Technologies vs. Availability Threats
Let’s review the tools that are available for defending our environments
against these threats. The most basic availability level is backup to
disk or tape – the industry’s fundamental safety net for
IT infrastructures. Back-up reduces the amount of data loss from data
corruption to about 24 hours, depending on how often daily backups are
taken.
If your company can’t afford to lose data for that long, the next
level of protection against data corruption is local mirroring –
creating a constantly updated copy of data on disk to provide real-time
availability within the data center.
When the availability of your data has been established, the next concern
is server availability and the protection of business applications.
Local clustering technology lets you group several servers into a single
resource. Failure on any server results in a failover to another server
in the cluster, and availability is protected, reducing downtime to
minutes and, in some cases, seconds.
Backups, mirroring, and local clustering can protect you against local
threats to availability – the first five threats that we described
in the previous section. But what if the unthinkable happens –
a disaster that knocks out your entire site? You can protect your total
environment by establishing availability of data and resources at a
remote site. You have two tools at your disposal to accomplish this:
replication and clustering.
Replication enables you to create a copy of your data online, in real
time, to disk storage at another location. Clustering goes a step further;
it combines replication of the application with the data. This means
that if you have a complete outage at your primary site, a single button-click
will restore service at your back-up environment. That is the highest
level of availability you can achieve.
The combination of these technologies not only provides 24x7 availability
but offers significant cost savings and impressive return on investment
(ROI) for IT departments. With that in mind, let’s examine our
four myths more closely:
Myth No. 1:
High Availability Costs Too Much
The popular view is that to achieve high availability you must double
your complement of hardware – duplicate server capacity for each
application, duplicate systems, and duplicate sites, all running on
the same server type and the same level of the operating system. Most
shops that use clustering software run an active/passive environment,
with servers running idle against the possibility of a failure.
Fortunately, these assumptions – that you have to live with poor
hardware utilization and OS constraints to get high availability –
are not true. You can add clustering and high availability to your environment
without buying more hardware. Software solutions are available that
enable you to use existing resources to create a high-availability environment
across vendor platforms without OS restrictions. You can manage multiple
different servers using the same clustering solution across Sun Solaris,
HP-UX, IBM AIX, Windows, or Linux at a variety of OS version levels
to create a single high availability solution with high server utilization
rates.
Let’s look at a typical high availability environment with paired
servers and a variety of operating systems. Some server pairs are active/passive;
this offers the highest levels of availability but is the most expensive
approach because one server is always totally idle, tapping its fingers
while it waits for a failure. Some server pairs are active/active, which
is much more cost-effective. However, it may provide much lower availability
than the active/passive approach, because in the event of a failure
the server that is still functioning has to do the work of two servers,
and its performance will drop off as a result, thereby reducing the
overall availability of the solution.
A single solution can be used across platforms and bring all servers
into a single clustered environment, providing high availability and
high server utilization rates, maximizing your hardware investment,
and eliminating the high-cost myth.
Myth No. 2:
High Availability is Too Complex
High availability is usually seen as complicated because the traditional
hardware-based approach requires that vendors install clusters –
an expensive proposition – and charge again for professional services
every time a new application comes on line.
Then there is the problem of labor-intensive management. It is time-consuming
and demanding to manage high availability across a variety of servers,
operating systems, and applications. If you operate five different server
platforms across multiple applications, each of them will demand a different
clustering solution and your already-high administrative costs will
rise.
You can add high availability, avoid all this complexity, and reduce
costs with a software solution that allows you to use the same clustering
platform across different platforms and operating systems. Once you
have the first cluster in place – a matter of minutes –
any administrator can add to the cluster or build new clusters quickly
and easily.
When you change the configuration of any node with the clustering tool,
you can extend the changes to all other nodes. All nodes are managed
from a single graphical user interface (GUI), with easy failover across
nodes within a cluster or across a distance.
Myth No. 3:
It’s Too Hard to Measure
One problem with traditional approaches to high availability is that
there is no satisfactory way to measure results. The IT department may,
for example, have a service level agreement (SLA) with a business unit
that states that there will be no more than two hours of downtime over
five nights. The department may achieve these goals but have no way
of knowing it. There may well be an availability problem, but with immeasurable
SLAs and no historical reports, there’s really no way to verify
performance or identify problems. Is it an application failure? Component
failure? Human error? Unfortunately, in many cases, nobody knows.
But integrated reporting tools are now available, and they can enable
you to track availability, report results, analyze trends, and identify
problems. They enable you to say with authority that you are meeting
SLA requirements and can support your statements with historical reports.
Myth No. 4:
It’s Too Hard to Test
IT managers who implement disaster recovery solutions face the problem
of uncertainty. It is impossible to be 100 percent sure that your configuration
will work until it’s actually in production and capable of causing
serious downtime – which is what you’re trying to eliminate
in the first place. So the system must be tested.
But testing availability creates a paradox – you have to risk
losing availability to see if your availability systems work. Consequently,
companies spend millions to implement a disaster recovery plan but are
never certain the plan will work because they don’t want to risk
downtime by testing it. They seem to be operating on blind faith.
If they do decide to test a disaster recovery plan, it will be inconvenient
and time-consuming. It will involve many steps, and it will almost certainly
take place over the weekend or in the middle of the night.
On the other hand, you can simply use a fire drill process to test your
disaster recovery solution on a spare system before you put it into
production. The firedrill creates a clone copy of your environment,
including clustering and replication processes, and tests it anytime
without impacting production at all. You then know positively how your
disaster recovery plan will work.
Disaster Recovery
We find that many companies don’t implement disaster recovery
because they believe it’s simply too costly. They think mostly
in terms of protecting data at a secondary site, but they don’t
think about getting the application running again so the data can be
accessed. They consider automated restoration of applications to be
unachievable today, so they focus on data protection.
Cost-effective software technology available today can integrate the
restoration of both your business-critical application and your data.
It automates the entire disaster recovery process to eliminate potential
downtime from human error. It is literally a one-click operation. From
a single solution in a single cluster, you can implement the integrated
restoration at any distance: locally, across the street, or even across
the globe.
Replication
Most companies continue to use the traditional hardware approach to
replication, which means that the computing and storage hardware at
the secondary site must duplicate the hardware at the primary site.
While this is a popular method it is also extremely expensive because
it is proprietary and leaves IT departments with no other choices where
vendors and operating systems are concerned. Furthermore, it has distance
limitations because it requires dual dedicated FibreChannel connectivity
for short runs. For longer runs, it becomes more costly, requiring FibreChannel-over-IP
hardware converter devices.
On the other hand, why not replicate the volume instead of the hardware?
Volume management software tools can give you bulletproof replication
over any hardware, over any network, and over any distance delivering
recoverable data at a much lower cost.
Conclusion
Knowledge about the use of innovative software technology dispels the
myths surrounding high availability. It is much less costly than traditional
hardware solutions. It is not complex to manage. In fact, it vastly
simplifies management. It can be implemented in minutes, and it can
easily be measured, analyzed and reported. It puts high availability
within easy reach of most companies today.
Matt Fairbanks, technical director at Veritas Software, is in
charge of product strategy for high availability and storage management
solutions for UNIX and Windows NT environments. Fairbanks joined Veritas
in 1996 and has held various product management and international marketing
positions in the areas of data protection, high availability, and network
systems management. Fairbanks received his MBA from Southern Methodist
University in Dallas.
©Copyright
Systems Support Inc. All rights reserved. Reproduction in whole or in
part in any form or medium without the express written permission of
System Support Inc. is prohibited.
|