|
|
||
|
DISASTER
RECOVERY Return
to the Winter 2002 _____________ Corporate President/CEO Vice
President
CONFERENCE REGISTRAR Brazil:
Jose Carlos Ferreira
|
Click Here for a Printable Version INFORMATION TECHNOLOGY Paradigm Shift In Handling Disasters By MIKE TALON Every member of a
corporation, from the CEO through the mailroom, is influenced by data
loss and system outages. Regardless of who is directly and/or indirectly
affected, the results of these interruptions are always the same at
their most basic level loss of time, loss of money, loss of business.
Therefore, each technology decision maker must determine what will be
an acceptable level of lost time and revenue due to system failure,
and every day the definition of acceptable gets smaller
and smaller. Now that the CFO and
CEO are often directly involved in the determination and provision of
information technology (IT) resources, the business impact of those
resources, and potential loss of said resources, is more scrutinized
than ever before. This trend has forced many companies to begin to look
toward ways to provide disaster prevention, or business continuity
the processes by which vital data resources are protected and not allowed
to be lost or interrupted at all. This is the Proverbial five
nines of reliability and uptime (99.999 percent availability),
resulting in a maximum of a few minutes of unscheduled downtime annually,
but this is a lofty goal and difficult target to meet even in the best
of circumstances. Before diving into
how to achieve business continuity (BC), we must first look at how businesses
reach this level of necessity and what circumstances brought the enterprise
from veritable carelessness only a few short years ago to the vital
vigilance that we see today. The revolution began
near the beginning of the dot-com craze just a few short
years ago. Prior to this point the enterprise space was keeping its
data either in paper form or some other physical media (punch-cards,
hardcopy, etc). In this configuration, the business data could theoretically
survive any digital disaster, as it was not kept in digital form. While
still susceptible to physical disaster (earthquake, fire, etc.), the
potential for serious loss by these means could be mitigated by storing
physical copies off-site in repository or secure facility. This is an
important concept as this theory of off-site storage later crosses
over into the digital world in a nearly identical form. Mainframe systems
of the time were backed up to heavily protected magnetic tape, and the
data kept on them was generally used in conjunction with physical hardcopy,
so that a loss of the system for a day to restore data wouldnt
bring down the enterprise. Suddenly, with the
advent of widespread computer use at the desktop level, employees were
not storing vital corporate data on the mainframe or in physical files.
That meant that power fluctuations, physical anomalies, and a host of
other disasters could literally wipe out valuable data without any potential
of restoring it. Very quickly, backup systems and office-based servers
sprang onto the scene to begin to address concerns that corporate data
on PCs should be handled in the same manner as the paper files
and magnetic tapes that protected the mainframes. As we progressed through
technology, the mainframe systems began to dwindle and disappear, even
in the enterprise space to a great extent. Smaller server systems were
put into place as a more flexible and economical alternative to the
older, slower mainframe systems. With this new computing power came
a whole new host of potential problems, not the least of which was data
loss. Once again the corporate IT staff was faced with the problem of
not only having to worry about the desktops getting vital data to the
server systems, but the fact that the server systems themselves were
little more secure than the desktops in the first place. More complex
backup systems were constructed to shift the data from the volatile
servers to somewhat more stable backup media, usually some form of magnetic
tape. For several years
this seemed to be an ideal situation. In the event that data loss occurred,
the tapes could be used to restore that lost data or even entire data-systems
in many cases. However, as we became more and more dependent on our
data-systems, business began to realize that the long amounts of time
spent waiting for the data-restoration process translated into large
sums of lost revenues. A better system had to be found in order to minimize
downtime, and disaster recovery services were born to fill the need. Disaster recovery
services (DRS) are systems put into place to restore data to a downed
or corrupted server system or other data system as quickly as possible.
The field of DRS is extraordinarily broad; ranging from re-configuring
tape systems to make them faster and more reliable, all the way through
keeping duplicate servers on standby to allow them to stand in at a
moments notice. Once again, IT staff had thought they had solved
the problems of data loss, but once again they were about to be proven
wrong. This brings us to
todays economy, which is data driven, IT dependent, and absolutely
chained to 24-by-7, 365-day-a-year data access. Even a few minutes of
downtime in for example an online stock trading software
system can cause millions in lost revenue. Computers never take days
off. Data systems never call in sick (one hopes) or demand coffee breaks.
The loss of any time online translates directly into the loss of corporate
revenue, especially in the enterprise space. Companies began the process
of translating IT functionality into business reality, and the results
shocked big business to the very core. Disaster recovery was no longer
an option; disasters could not be allowed to impact the business case
at all. How could a business
continue to operate in light of the myriad of potential hazards and
disasters out there? With the constant threat of earthquakes, floods
and other natural disasters; coupled with power grid outages, espionage
and other man-made data loss issues, there were too many variables to
anticipate every potential cause of data loss and system failure. The
science of business continuity was born to find a way to keep the systems
running, no matter what was going on in the physical world. Business continuity
planning (BCP) is really just disaster prevention in action. Its
the science of determining ways to allow data systems to continue working,
even if an entire physical location is downed or destroyed. The baseline
idea to remember here is that data-systems are portable objects. They
are not dependent on the particular pieces of hardware you run them
on, and can be moved to other, similar hardware at any time provided
the right expertise and software is available. This is a fundamental
shift in thinking from the days of mainframe-based enterprise computing,
where the system was the hardware for the most part. In todays
digital arena, hardware is often the least important part of a data-system,
relying instead on the level of operating systems and software packages
you are using to determine the power of the system itself. Once business
IT staff made this leap of faith to believe that hardware was not the
most crucial component, the doors of BCP were flung wide open to allow
for the advent of the distributed data system. IT development staff
could now design systems that ran as clusters, multiple computers sharing
a common data source that could stand in for each other in cases where
one server failed. They formed load-balanced websites with groups of
servers that could all share the load of a single or even multiple downed
machines. They created e-mail server groups that spanned the country,
each one able to hold messages for an offline counterpart. No longer
was the corporate data system at the mercy of a single point of failure.
The entire data-center could be grouped, clustered, and manipulated
as a single entity to protect the data of the enterprise! After the initial
elation wore off, IT staff realized one major flaw in the plans. There
was a single point of failure, the single data-center. For the most
part, the business continuity failover systems where physically located
about three feet from the primary systems. Meaning even the most redundant
data-center can fall victim to a power grid failure, and when the diesel
backup generators finally run out of power, the data-center and corporate
data, go offline. Far from being back where we started, BCP still had
a long way to go before we could reach the mythical five nines. Mega-storage companies
like EMC stepped up to the plate by producing storage systems that could
replicate themselves to other data-centers, not located in the same
physical vicinity. This meant that the entire body of corporate data
could be kept up-to-date in some other location, thereby protecting
against the possibility of failure due to the loss of a physical location.
This is the same theory businesses used to rely on to protect
physical data like punch cards and hardcopy years before. On the surface this
was an ideal solution, but it was only a reversion to disaster recovery,
just on a much larger scale. The data was safe in a secondary location,
but inaccessible until the primary location could be brought back online.
The servers the machines that end-users computers connect
to in order to get information were still located and attached
to the primary storage device. Even clusters, where groups of servers
can stand in for each other, needed to by physically connected to the
primary storage device, meaning that if the entire data-center went
offline, there was nothing for the end-user to connect to even
though the data itself was safely stored off-site. Technology needed
to make another leap in order to fully address the situation. Building on data replication
began to develop BCP software that would allow the entire data-systems
of an enterprise transcend physical boundaries, thereby allowing the
systems themselves to survive physical site failure, not just the data.
These products allowed the enterprise to eliminate the single point
of failure of the single physical site without falling back to DR paradigms.
Clusters no longer needed to be physically connected to a shared storage
array, and stand-alone machines could stand in for each other no matter
where they were physically located. By utilizing platform and storage
independent data structures, these products allowed IT staff to create
duplicate hardware and software configurations in multiple physical
locations that could share data and keep each other up-to-date. They
could also stand in for each other on a moments notice without
end-users having to perform any tasks. Essentially, the end user continues
to work, uninterrupted, while the data systems handle all the tasks
of taking over the data-processing load for their downed counterparts
in some other city. Large-scale data systems
can now seamlessly replicate, not only the data itself, but also the
very data-systems that are vital to keeping the enterprise up and running.
A failure of an exchange e-mail system in Boston can now seamlessly
switch to a physical system in Detroit, without the CEO (or anyone else)
missing a single message. The IT staff can then correct the issues in
Boston and fail-back the physical systems to restore them to their original
state when time permits; without the pressure and rushing that often
causes even more damaging mistakes than the original outage. It is this monumental
paradigm shift from keeping everything on physical media that
could be duplicated off-site to a digital world of self-healing data-systems
that can create the truly digital, always on enterprise. With
the innovative new generation of software products now available to
IT staff, the goal of five nines can be met for the first
time, and can be met reliably regardless of acts of man or nature. Finally, enterprise-class,
always available systems can be constructed that would not be taken
out by physical disaster, espionage, end-user accidents or any other
mishap. The corporate data has become truly safe and secure, and business
can get on with what it does best - concentrating on business and letting
the data-systems concentrate on the data. n To comment on this article, go to 1501-07 at www.drj.com/feedback.
©Copyright 2002 Systems Support Inc. All rights reserved. Reproduction in whole or in part in any form or medium without the express written permission of System Support Inc. is prohibited. |