|
DISASTER
RECOVERY
JOURNAL
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
MANAGING EDITOR
Jon Seals
jon@drj.com
COPY EDITORS
Richard Sandhofer
richards@drj.com Pamela
Clifton
pamelaclifton@hotmail.com
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
www.drms.com.br
|
|
Click
Here for a Printable Version
DATA
PROTECTION
Are
You Managing the Risks of Downtime?
By WALT HINTON
& ROB CLEMENTS
Can you imagine going
out of business next month? Next year? Without an adequate disaster
recovery plan, what seems like an unlikely scenario could become a very
frightening reality.
Corporate data is growing exponentially, and, with it, the potential
for disaster to strike: namely, extended periods of data inaccessibility,
or worse, total loss of data. International Data Corp.s conservative
estimates put data growth at 80 percent per year. As the amount of data
grows, so does a companys dependence on that data to generate
revenue, increase customer penetration and satisfaction, support day-to-day
operations, and ensure the long-term success of the business.
A disaster recovery plan functions as a kind of data insurance policy
against unplanned outages and/or data loss. This data insurance policy
guards against dollar losses that can range anywhere from occasional
dings of a few thousand dollars, to mounting debts which may eventually
put the company out of business. In addition to dollar losses, companies
risk the potential loss of existing and new business, the potential
loss of customer confidence, and potential liabilities to customers
and investors. Failure to produce certain data in response to an audit
may also carry legal consequences.
While regular backups are part of a viable disaster recovery strategy,
backups alone do not provide a complete solution. Backing up critical
server data into a tape library is an excellent practice, but if the
entire building burns, the backups are gone too. Further, a complete
disaster recovery solution needs to address the real possibility that
the primary computing infrastructure may be unavailable, and another
infrastructure may be required on short notice to provide a new computing
footprint into which critical applications and data can be recovered.
This temporary footprint may be located at another company facility,
or provided by a third party. In the latter case, the equipment in question
has to undergo a bare metal restore, in which all appropriate
operating systems and applications are reloaded to a baseline state
before any company data can be restored. Together with a good data protection
practice, these provisions will help provide a more complete and robust
disaster recovery plan.
To determine exactly how much a viable disaster recovery plan is worth
to your business, you need a thorough understanding of the value of
the companys critical applications and data stores, and the infrastructure
required to support them. Next, a protection strategy must be developed
which prioritizes these assets relative to their business importance.
Remember, these strategies must accommodate different levels of severity:
a disaster may be anything from a lost file or corrupted
database, to a rampant computer virus, to a man-made or natural catastrophe
that destroys all virtual and physical assets.
Understanding Your
Data
A companys real data growth can vary greatly from industry averages:
the problem is, most companies dont know by how much. Software
tools for monitoring data storage capacity and utilization across an
enterprise are expensive and hard to find, and most companies dont
have the time or technical expertise to regularly perform such an analysis.
Consequently, most companies have widely distributed data stores that
are difficult to classify as mission critical or non-mission
critical, and which often lack backup and recovery plans commensurate
with their importance to the business.
Before developing a disaster recovery plan, its important to understand
the recovery requirements for various applications. Resources can then
be prioritized appropriately to minimize impact to the business
should a disaster occur. There are two main criteria for prioritizing
your critical applications and data:
Speed to Recovery:
How long can your organization live without this data or application?
What are the effects on the business for each hour of downtime you experience?
Recoverability: What
would be the impact if you lost the last hours data? The last
four hours? The last 24 hours?
Speed to Recovery
Imagine you are an auto manufacturer that relies on a line schedule
system to support your manufacturing facilities 24 hours a day. In this
case, the impact on the business can be measured in terms of lost production.
That is, if you produce $250,000 an hour worth of automobiles, a four-hour
outage would cost you a million dollars.
If you are a utilities company whose system outages leave the public
without phone service, the business impact of an outage may be measured
in terms of loss of customer confidence, potential legal liability or
quantified as service level violations that require monetary compensation
to customers.
Or, perhaps you are a clothing retailer with an experimental Web site
which does not yet support any sales or transactions and
it goes down. In this case, you may not experience any significant business
impact.
Practically speaking, a company may have a mix of potential consequences.
Outages of manufacturing or sales systems run the risk of costing millions
of dollars, while outages of static Web content or archived data files
may be relatively insignificant. For each of your key business applications,
you should (a) prioritize the applications and data stores in your organization
relative to each other, and (b) understand the true financial impact
over time of the unavailability of that application or data. The following
questions will help you assess the consequences of an unplanned outage:
When does the unavailability of this application/data store significantly
impact the business?
Does this application/data store generate revenue? If so, how
much revenue does it generate in a minute, an hour or a day?
What are the potential dollar losses that would occur if this
application/data store were unavailable for an hour?
What are the intangible losses (i.e., loss of customer confidence)
that would occur in the event of unavailability for an hour? A day?
Are there applications/data stores that you have identified as
non-mission critical that could have a greater impact if they were unavailable
for a prolonged period of time? (i.e., does their criticality escalate?)
How long of an outage could be tolerated on these systems before significantly
impacting the business?
How quickly can you recover this application/data store in the
event of an outage? Data corruption? A fire? Man-made or natural catastrophe?
How long did it take you to recover this application/data store
in the last actual disaster or disaster recovery test?
Has a cost and risk analysis already been performed for this
application/data store?
Do you understand how one hour of unavailability impacts the
profitability of your company?
How many customers will choose to deal with another company if
your application/data store is down for an hour? Twenty-four hours?
How will an hour of application downtime impact your production
schedule?
Will you need to send employees home because they cannot continue
to work without this application/data store? After what length of asset
unavailability will you send them home?
What replacement infrastructure may be required to restore accessibility
to this asset?
Recoverability
of Data
There may be some situations where an unplanned outage deteriorates
into a completely unrecoverable situation. Therefore, its important
to prioritize applications not only in terms of how quickly you need
to recover them, but also how closely you need to guard against actual
data loss.
Lets look at a real example: in the first World Trade Center bombings
in 1993, 43 percent of the businesses that experienced substantial data
loss never re-opened. Another 29 percent went out of business within
two years. The long-term consequences of the Sept. 11 attacks have yet
to fully register, but can already be measured in the billions of dollars.
The threat of data loss to your business is real; guard your most critical
applications against that threat.
Consider the following questions as you prioritize your applications
and seek to understand the financial and business impact of permanent
data loss:
If application/data loss were to occur, would there be a way
to recreate that data i.e., re-entry of manual work-orders or
forms? What is the cost of that manual recreation of data?
What would be the impact of permanent loss of the last hours
worth of data? The last 24 hours? The last weeks?
For applications where permanent loss of data appears to have
little or no impact on the business, will this information be required
at some point in the future? What might this information have been used
for and what are the anticipated losses from the inability to access
it?
Are you required by a regulatory agency or stakeholder to make
this data available for audit? What are the potential liabilities for
not having this data?
The Cost of Downtime
Here are some useful equations to help you calculate the cost of downtime.
Total Business Lost = (Gross Revenue) x (% of Lost Customers due to
Outage)
This equation is targeted at transaction-based organizations
that might see customer attrition as an impact of unreliable availability.
Application-Based Business Lost = ((Annual Gross Revenue generated by
Application / 365 days/year) / (24 hours/day))
This formula calculates the per hour impact a specific applications
unavailability has on overall corporate revenue.
Lost Production Capacity = (Number of Units Not Produced) x (Unit Price)
Used specifically for manufacturing organizations, this calculation
determines the lost production due to application unavailability.
Lost Net Revenue = (Number of Units Not Produced) x ((Unit Price)
(Unit Production Cost))
Again, this formula is primarily for manufacturing organizations
and helps quantify the cost of lost production time due to application
unavailability.
The following formulas will help you calculate the total cost of a specific
application outage:
Total Cost of Recovery = Cost of People Time Lost + Cost of Lost Data
+ Replacement Infrastructure + Cost of Recovery Services
Cost of People Time Lost = ((Average Time to Recover) x (Average Wage
of Users) x (Number of Users))
Cost of Lost Data = ((Gross Revenues / Business Days per Year) x (Percentage
of Data Unrecoverable))
Lets look at the following example. Company ABC gross revenues
generated from an application of $10,000,000 per year:
150 employees use the application to generate revenue and their
average wage is $10 an hour;
The average time to recover the data is four hours;
The business runs 250 days a year.
| Downtime Cost |
Impact |
| $10,000,000 |
Revenue |
| 250 |
Days in operation |
| $40,000 |
Revenue per day |
| 24 |
Hours in day |
| $1,667 |
Revenue per hour |
| 4 |
Hours down |
| $6,667 |
Downtime cost |
| Employee |
Productivity Impact |
| 150 |
Employees |
| 4 |
Hours down |
| 600 |
Employee downtime hours |
| $10 |
Hourly wage per employee |
| $6,000 |
Employee productivity impact |
Downtime Costs
By Industry
Downtime costs will vary by industry and are largely dependent on a
companys dependence on technology and data. The following chart
illustrates the average downtime per hour for many industries, but remember
that vulnerability to data unavailability and loss isnt just limited
to monetary impact, it also includes such things as loss of customer
confidence, liability, and lost current and future business.
| Industry |
Hourly Downtime
Costs |
| Brokerage Operations |
$6,450,000 |
| Energy |
$2,817,846 |
| Credit Card Sales
Authorizations |
$2,600,000 |
| Telecommunications
|
$2,066,245 |
| Manufacturing
|
$1,610,654 |
| Financial Institutions |
$1,495,134 |
| Information Technology
|
$1,344,461 |
| Insurance |
$1,202,444 |
| Retail |
$1,107,274 |
| Pharmaceuticals |
$1,082,252 |
| Banking |
$996,802 |
| Food/Beverage
Processing |
$804,192 |
| Consumer Products |
$785,719 |
| Chemicals |
$704,101 |
| Transportation |
$668,586 |
| Utilities |
$643,250 |
| Healthcare |
$636,030 |
| Metals/Natural
Resources |
$580,588 |
| Professional Services |
$532,510 |
| Electronics |
$477,366 |
| Construction and
Engineering |
$389,601 |
| Media |
$340,432 |
| Hospitality and
Travel |
$330,654 |
| Pay-Per-View TV
|
$150,000 |
| Home Shopping
TV |
$113,000 |
| Catalog Sales |
$90,000 |
| Airline Reservations
|
$90,000 |
| Tele-Ticket Sales |
$69,000 |
| Package Shipping |
$28,000 |
| ATM Fees |
$14,500 |
| Average |
$944,395 |
Sources:
IT Performance Engineering and Measurement Strategies: Quantifying Performance
and Loss, Meta Group, Oct. 2000; Fibre Channel Industry Association.
Data Protection
Options
Once you understand the value of each of your applications, both in
terms of downtime and data loss, you can begin to assemble the appropriate
disaster recovery strategy for your business. This strategy should include
provisions for recovering data, applications, and if necessary, the
requisite hardware infrastructure. Some typical strategies for data
protection follow. Note that any of these strategies may be deployed
against one or more of your critical applications, and a mix of strategies
may in fact be the best solution for your particular situation.

Regular Backup
Regimen
A full discussion of backup methodologies is beyond the scope of this
paper, but its clear a regular backup regimen is the first line
of defense against data loss from unplanned outages. The backup regimen
need not be complicated, but it must be followed consistently in order
to be effective. Effective backup strategies usually include local and
remote copies of data (see below) and some mix of full, incremental
and differential data capture. Typically a high density, low cost media
such as magnetic tape will be used to retain data for a period of weeks
to months, and then the media will be recycled.
Remote Data Mirroring
Remote data mirroring offers the highest levels of availability and
business continuance by synchronously (no delay), near-synchronously
(minimal delay), or asynchronously (definable delay) replicating data
from your on site disk array over a secure network to a hot site facility.
In the event of an outage or a disaster, systems may then point to the
mirrored copy and continue operations or the primary data store may
be restored from the mirror with little or no data loss.
Business Continuance
Volumes (BCVs)
BCVs are snapshots of all or part of a disk filesystem that are
taken periodically and stored in another disk allocation. For example,
an online e-tailer may choose to generate a BCV once every hour and
maintain at least four BCVs at any point in time. For example,
in the event widespread database corruption occurs, rather than going
back to the previous nights tape backup and losing all of the current
days transactions, the e-tailer may resort to the earliest BCV
in which the corruption does not exist. Essentially the BCVs provide
periodic tape backups to restore from in case of an emergency.
Remote Tape Backup
Remote tape backup is simply tape backup done over a point-to-point
or VPN connection from your site to a secure, off site facility. This
can be the primary backup regimen, or can be performed as an adjunct
to on-premises backups. This differs from off site tape archiving in
that the assets remain readily accessible at the remote site, rather
than being parked on a shelf. Various storage service providers can
automate and facilitate this process.
Off Site Tape Archiving
Off site tape archiving provides the least accessible data storage option,
but offers a low-cost option for long-term data archiving. Tapes are
taken off site by an archiving company and stored in a secure, hardened
facility for as many months or years as you specify. Tapes will be delivered
back to you if you should need to access the data stored on the tapes.
The off site archive is like a bank vault: it keeps your data safe from
fire, theft, natural disaster and damage.
Rationale For Outsourcing
Data Protection And Disaster Recovery Services
After outlining the disaster recovery strategy that works best for your
company, you may want to consider how exactly you will go about implementing
the strategy. The decision to develop a disaster recovery plan entails
a variety of subsequent tasks that you may or may not have the time
and qualified resources to perform, such as:
Hardware and software evaluation;
Technology and service provider evaluation;
Network design and management;
Infrastructure integration and installation;
Disaster recovery plan maintenance and monitoring procedures;
Business continuance procedures;
Data restore procedures;
Disaster recovery plan test procedures and auditing schedule;
Test/audit results documentation;
Periodic disaster recovery plan validation to ensure they remain
in line with company requirements.
Given the breadth
of these tasks, and expertise they require, the outsourcing of the design,
implementation, management, maintenance and monitoring of a data protection
and disaster recovery practice is a reasonable solution to the DR question.
Storage service providers with core competencies in these areas rely
on their specialized technical acumen, including years of storage and
networking expertise, along with specialized software tools for device,
network and storage management. Together these core competencies and
specialized tools enable storage service providers to achieve the highest
levels of availability, performance and data security more efficiently
and cost-effectively than customers could otherwise achieve on their
own. Further, the outsourcing of these laborious and time-consuming
tasks enables customers to re-focus key IT personnel on strategic tasks
and business objectives rather than being consumed with maintenance-related
chores. Storage service providers will typically provide these services
under the terms of a contract and service level agreements that guarantee
the availability, security and reliability of data.
Industry research has shown that data backup and disaster recovery practices
have typically been perennial sources of difficulty for IT professionals
in that they take up a great deal of time each day, they are often boring
and cumbersome chores, and they are only visible when something goes
wrong. Therefore, outsourcing these perennial headaches is often a very
attractive proposition for IT professionals with better things to do.
There are several key technical and financial reasons outsourcing of
data protection and DR services makes sense:
Technical arguments
in favor of outsourcing include: higher backup success rates, better
restore SLAs to other departments, standardized reporting of results,
centralized command and control of all data protection across the (wide-area)
enterprise, single escalation path for support issues, etc.
Financial arguments
for outsourcing include: the ability to re-focus or re-deploy some percentage
of IT resources, higher resource utilization can defer new purchases,
more accurate reporting of resource allocation, better growth prediction
and resource planning, centralized billing and accounting information,
better accountability of resources at branch offices, etc.
Walt Hinton is the
chief technical officer at ManagedStorage International (MSI) and Rob
Clements is an integration specialist. MSI is a global provider of complete
storage solutions, helping enterprise companies and leading service
providers strategically manage storage as a critical corporate resource.
For more information, please visit http://www.managedstorage.com.
To comment on this
article, go to 1503-12 at www.drj.com/feedback.
|