|
The
Time Has Come For DRP Metrics
By JAN PERSSON, CDP
I’ve
been in the disaster recovery planning (DRP) arena for more than 25
years. By some measures it should be considered a mature discipline.
That would include accepted process procedures, policies and guidelines,
clear best practice, regular accountability reporting, etc.
In reality, at least from what I see, most companies have been more
willing to dedicate resources and money to DRP, but the discipline of
managing and reporting how it is working has been slower to materialize.
Now, with more focus from the board level directors, it’s prudent
to provide a more quantitative, accurate, empirical set of reporting
numbers, i.e. “metrics.”
Metrics, in general, have been around for a long time in the IT departments.
We’ve always reported down time, mean time between failure (MTBF),
mean time to repair (MTTR), outage analysis, response time (at the terminals),
etc. In the area of DRP there are definitely some appropriate metrics.
This article simply offers a dozen for your review and consideration.
It could be at the very least a starting point. For those who have already
implemented DRP metrics, it could serve as a checkpoint.
Assumptions
This article is not intended to be a comprehensive critique of all possible
DRP metrics. Please keep in mind a few assumptions as you read along:
- It is assumed that you have a DRP program in place today.
- It does not attempt to delineate between large, medium and small computer
centers. That would make it too complex and lengthy.
- It is assumed that some metrics are more meaningful than others in
various centers.
Benefits of DRP Metrics
• Measurements tend to present things as they are in objective
terms.
• Awareness training in general should be an active part of DRP.
• They help focus on issues that management needs to know. For
example, if the BIA indicates a recovery time objective (RTO) of six
hours and the true recovery based on testing is 24 hours, it needs to
be clearly understood.
• It can be put into people’s objectives (i.e. responsibilities)
and measured.
Suggestions
• Fill out the last column (metric) for your location.
• Circulate for comments, validation, and general awareness.
• Be extremely persistent. If you don’t have the metric
keep asking.
• Establish a regular DRP metric reporting schedule. Quarterly
might be a good schedule to start with.
• Be proactive. Add DRP metrics as they are appropriate to your
location.
Risk Factors
What could the risks be? Well, what I’ve seen most is an incorrect
level of “expectations” concerning DRP from management.
The first one I feel is a risk is the assumption that since a hot site
(or other recovery solution) is being utilized that all critical applications
are covered. In reality, many companies have covered some but not all
critical (often referred to as mission critical (MC)) applications and/or
business functions. Is it better to report that 50 percent of critical
applications are covered? This better indicates what to expect and also
indicates more growth is required.
Another risk factor involves funding. A correct accounting (i.e. metric)
concerning how much is covered and how much is left to address is beneficial
where funding is concerned. Paint the proper picture with quantitative
analysis (for example, we still have six Intel servers (25 percent)
to add to the recovery configurations). Harder to get funding if they
think you’re done.
Finally, and this one tends to cause the biggest problem, it is absolutely
imperative the metric for “recovery time” for MC applications
is documented and reported. Usually this is done through testing. Refer
to No. 6 in the suggested metric table. This must be reported absolutely
correct. The risk is that the business adjusts their response plan to
the duration that the application will be out. If the reported metric
is a very optimistic “eight hours” and in reality the recovery
is more like “24 hours,” everyone will lose. Additionally,
in the absence of a metric for this one, rest assured the shortest mentioned
recovery time will be expected. Risky business.
Final Comments
It is my belief that DRP has moved well past the early days (last 20
years) of the rather mundane tasks of; write a plan, backup the data,
and go to the hot site (often by ourselves) and test once a year or
every two years. A well-defined and tested DR plan is essential to company
survival. It’s not something to be taken lightly! By implementing
a sound “DRP metric” program it is constantly being reviewed
and fine-tuned. It brings to light where we are and were we need to
be.

Jan Persson, CDP, has worked in the IT field since 1967. He began his
formal disaster recovery involvement in 1980 and in 1985 started his own
disaster recovery consulting practice, Persson Associates. He has written
and/or audited more than 300 DR plans, worked for and with the three major
disaster recovery firms, conducts executive seminars and plan development
workshops on a regular basis, and continues to take an active, hands-on
role in DR activities in all size shops and environments. You may contact
him at (847) 732-6500 or jppersson@aol.com.
©Copyright
2004 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
«BACK
to the Articles Index
|