Spring World 2018

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 31, Issue 1

Full Contents Now Available!

I’ve been in the disaster recovery planning (DRP) arena for more than 25 years. By some measures it should be considered a mature discipline. That would include accepted process procedures, policies and guidelines, clear best practice, regular accountability reporting, etc.

In reality, at least from what I see, most companies have been more willing to dedicate resources and money to DRP, but the discipline of managing and reporting how it is working has been slower to materialize.
Now, with more focus from the board level directors, it’s prudent to provide a more quantitative, accurate, empirical set of reporting numbers, i.e. “metrics.”

Metrics, in general, have been around for a long time in the IT departments. We’ve always reported down time, mean time between failure (MTBF), mean time to repair (MTTR), outage analysis, response time (at the terminals), etc. In the area of DRP there are definitely some appropriate metrics. This article simply offers a dozen for your review and consideration. It could be at the very least a starting point. For those who have already implemented DRP metrics, it could serve as a checkpoint.

Assumptions

This article is not intended to be a comprehensive critique of all possible DRP metrics. Please keep in mind a few assumptions as you read along:
- It is assumed that you have a DRP program in place today.
- It does not attempt to delineate between large, medium and small computer centers. That would make it too complex and lengthy.
- It is assumed that some metrics are more meaningful than others in various centers.

Benefits of DRP Metrics

• Measurements tend to present things as they are in objective terms.
• Awareness training in general should be an active part of DRP.
• They help focus on issues that management needs to know. For example, if the BIA indicates a recovery time objective (RTO) of six hours and the true recovery based on testing is 24 hours, it needs to be clearly understood.
• It can be put into people’s objectives (i.e. responsibilities) and measured.

Suggestions

• Fill out the last column (metric) for your location.
• Circulate for comments, validation, and general awareness.
• Be extremely persistent. If you don’t have the metric keep asking.
• Establish a regular DRP metric reporting schedule. Quarterly might be a good schedule to start with.
• Be proactive. Add DRP metrics as they are appropriate to your location.

Risk Factors

What could the risks be? Well, what I’ve seen most is an incorrect level of “expectations” concerning DRP from management. The first one I feel is a risk is the assumption that since a hot site (or other recovery solution) is being utilized that all critical applications are covered. In reality, many companies have covered some but not all critical (often referred to as mission critical (MC)) applications and/or business functions. Is it better to report that 50 percent of critical applications are covered? This better indicates what to expect and also indicates more growth is required.

Another risk factor involves funding. A correct accounting (i.e. metric) concerning how much is covered and how much is left to address is beneficial where funding is concerned. Paint the proper picture with quantitative analysis (for example, we still have six Intel servers (25 percent) to add to the recovery configurations). Harder to get funding if they think you’re done.

Finally, and this one tends to cause the biggest problem, it is absolutely imperative the metric for “recovery time” for MC applications is documented and reported. Usually this is done through testing. Refer to No. 6 in the suggested metric table. This must be reported absolutely correct. The risk is that the business adjusts their response plan to the duration that the application will be out. If the reported metric is a very optimistic “eight hours” and in reality the recovery is more like “24 hours,” everyone will lose. Additionally, in the absence of a metric for this one, rest assured the shortest mentioned recovery time will be expected. Risky business.

Final Comments

It is my belief that DRP has moved well past the early days (last 20 years) of the rather mundane tasks of; write a plan, backup the data, and go to the hot site (often by ourselves) and test once a year or every two years. A well-defined and tested DR plan is essential to company survival. It’s not something to be taken lightly! By implementing a sound “DRP metric” program it is constantly being reviewed and fine-tuned. It brings to light where we are and were we need to be.



Jan Persson, CDP, has worked in the IT field since 1967. He began his formal disaster recovery involvement in 1980 and in 1985 started his own disaster recovery consulting practice, Persson Associates. He has written and/or audited more than 300 DR plans, worked for and with the three major disaster recovery firms, conducts executive seminars and plan development workshops on a regular basis, and continues to take an active, hands-on role in DR activities in all size shops and environments. You may contact him at (847) 732-6500 or This email address is being protected from spambots. You need JavaScript enabled to view it..