ASSUMPTIONS
This article is not intended to be a comprehensive critique of all plan elements and requirements. Please keep in mind a few assumptions as you go through the article:
- It is assumed that you have a DRP in place today.
- It does not attempt to delineate between large, medium and small computer centers. That would make it too complex and lengthy. Use what applies to you.
- It presents points that most DRP’s must address but does not try and apply weights to the different points. It is assumed that some are more critical than others in various centers.
- This article is not meant to be a comprehensive DRP audit. It is simply a number of key points (an even dozen) that can be used to demonstrate a sense of “Readiness.”
RATING SCALE
Just for fun I decided to give you an opportunity to think about each point and give yourself a rating. Typically when I do training sessions I ask people the questions and then sort of say; How do you feel about where you are? For this exercise I present the following rating scale to use:
5 = It has been proven to me that we have this point covered, period!
4 = I have been told that this point is covered and I tend to agree.
3 = I feel we meet the minimum requirements for this point.
2 = I’m not comfortable that we have this covered as well as we should.
1 = I know this is an exposure and I’m very worried about it.
CONFIRMATION CHECKUP
Over the past 16 years, since I started my disaster recovery consulting practice, I’ve been in hundreds of computer centers and have written and audited over 200 Plans. More recently I’ve been doing a lot of disaster recovery training seminars and workshops which gives me even more exposure to differed sites and their overall “readiness”. I have learned to ask a few key questions, which indicate to me how prepared they really are. I call them the “Basics”. In short, if these are not done well the plan has little chance of performing as expected.
I call this first group of points the “Confirmation Checkup.” Too often I see people gloss over these with generalization comments like:
- “Yes, we’ve tested our plan. We were able to bring up the Operating System at the Hot Site”.
- “Sure we backup our data. The daily backups are on-site and the weekly backups go offsite”.
- “We know what’s critical. Our I.S. staff did a list about 3 years ago”.
- “I think our realistic recovery goal is 72 hours. Management probably thinks it is all done in a few hours”.
It’s just like the old adage “An Ounce of Prevention”. A successful DR Planner always has an eye on prevention and any other area that can reduce a disaster situation. This is probably true today more than ever. The computer center, while locked and staffed still is vulnerable to outside attacks from people you can’t even see. Disaster avoidance is a part of DRP. Lets face it, when the systems are unavailable for any reason the end users don’t really care what caused the outage. They just know, to them, it’s a disaster when payroll is due out in 1 hour and the system is down. Hackers, fire, power outage, virus attack, whatever.
What we don’t need to hear are comments like:
“We keep the doors propped open; it makes it easier for everyone”.
“You know, changing passwords is a real pain in the neck. That’s why we only do it once a year”.
“What do you mean you found active ID’s and passwords for 30 people who were terminated last year?”
“I think we should just give everyone free access to the Internet. Downloading neat stuff is sort of a company perk. Besides, it’s a lot faster on the high speed line at work”.
EXPANSION CHECKUP
Do not under any circumstances be lulled into complacency with a DR Plan that is “Finalized, Done and Tested”. Good DR Planners constantly have their antenna up and can pick up blips on the radar screen. Good planners also know how and when to stick their nose into any area that implements change that would either affect or change the recovery capability. Once again, if the business impact (BIA) is known, the job is easier.
Comments like these are just not acceptable anymore:
“I know it’s critical but it’s not in my area. We’ll wait until they have a problem to get involved”.
“No one told me they added 45 more disk drives to support the critical applications.”
“Now we’ve got some new critical equipment installed that the Hot Site doesn’t even offer”.
Calculate your point score as follows:
1. Add up the score you applied for each point
2. Apply the following assessment:
12 to 24 = A lot of work needs to be done. A Lot!
24 to 36 = Needs improvement. Not a good comfort level.
36 to 48 = Not bad. In fact pretty darn good.
48 to 60 = Excellent work! You are to be commended.
Caution: More important than the score is a good sense of what points need to be strengthened to create a better plan.
FINAL COMMENTS
It’s a question of survival. And lets face it, DRP plays a major role in survival. It is my belief that DRP has moved well past the early days (last 15 years) of the rather mundane tasks of: write a plan, backup the data, and go to the Hot Site (often by ourselves) and test once a year or every two years. We are in fact the “Sentry” for business survival. We need to ask the difficult questions, stick our nose in when we sniff a problem, and be a proactive, optimistic, and positive component of the company.
I read a recent (January 2001) trade journal article that presented some relevant statistics (I like statistics). It indicated that today over 50% of corporations have a RTO of less than 24 hours. To me, that means critical. Can your plan meet the goal of 8 or 10 hours to be operational? Is data mirroring in your future?
Another recent (also January 2001) article strongly suggested that E-Commerce will not only survive but will play a key role in many companies survival. The growth forecast is astounding.
The E-Business technology is beginning to show great rewards in cost savings and extended sales growth. Is your E-Business plan in place? Will it recover in one or two hours?
In yet another article (October 2000 White Paper) presents a case for Network Storage given the
“Storage Explosion”. Rates are projected to drop from $.30/MB to $.01/MB by 2005. How do you balance these costs against the cost to recover 1, 2 or even 3 full days of lost input? Where do SAN’s, LAN’s and GAN’s fit in your plan?
As we move along in this “DRP Arena” we need to keep a diligent eye on the Basics, protect the assets, and expand to keep pace.
Point Confirmation Checkup
Your Rating
1 to 5 Give yourself a "5" if:
1 Business Impact
Confirm that all Critical Business Functions are identified using a BIA approach with end user input and are covered in the Recovery Plan regardless of the platform they run on. A Recovery Time Objective (RTO) such as 24 hours is identified. $$$ Loss is clearly stated.
You can produce a complete, current application list and a BIA done within the last 12 months. And your plan includes a specific RTO.
2 Data Backup
Confirm that computer data is backed up on a regular basis, and/or mirrored to an alternate site. All platforms are covered and the backup media (usually tape) is sent offsite immediately upon creation. Multiple copies or gens provide a fallback should any tapes be missing, damaged, or unreadable.
You can produce a complete set of backups (or mirrored disk) that would totally be able to rebuild all platforms synchronized, to the proper point.
3 Recovery Window
Confirm that based on the RTO (Recovery Time Objective) as stated from the BIA, the data backups can support the goal based solely on media kept offsite. RPO is the Recovery Point Objective, i.e. the ability to restore to a specific point. The goal is RTO is possible given RPO. If your goal is 24 hours, 5-day-old data is a big problem.
This one is simple. Give yourself a 5 if you have tested this and it worked as expected. For all platforms!
4 Testing
Confirm that comprehensive testing of all critical platforms, critical applications and network components in an alternate location (usually a Hot Site) is complete and accurate based on actual documented test results.
You earn a 5 if you test once or twice a year, include end users in duplicating critical application environments and meet the RTO.
5 Executive Concurrence
Confirm that the CIO and often Sr. Management have been a party to the decisions and financial commitment to provide a DRP and they agree with the recovery parameters.
You earn a 5 here if your Sr. Management, CIO (and in some cases the Board) have signed off on the plan goals, specifically the RTO.
Point Protection Checkup
Your Rating 1 to 5
Give yourself a "5" if:
6 Hardened Facility
All critical sites should be Hardened which includes limited access, badge or combination door locks, 24X7 guards or video surveillance, full UPS, fire protection, forced entry alarms, heat sensors, water sensors, etc.
Security precautions are in place and followed by everyone. No exceptions. UPS is regularly tested along with all other alarm mechanisms.
7 Intrusion Protection
A compliment of tools and controls are in place including current virus software, IDS (Intrusion Detection System), firewalls, data transmission encryption and digital certificates. All of which must be operational at the Hot Site.
Intrusion has been addresses at the primary facility and also tested at the alternate site. Virus software is updated very often (daily) and encryption is used.
8 Redundancy
The goal is to remove, or at the very least minimize, Single Points of Failure. For example: It appears in all areas such as Network (dual paths), H/W (backup units or hot swappable components), S/W (source code prior versions), Facility (Hot Site), Disk (mirrored or RAID), Power (UPS, alternate grid), etc.
You have identified and documented all Single Points of Failure and implemented a solution or work around.
Point Expansion Checkup
Your Rating 1 to 5
Give yourself a "5" if:
9 Web Site Recovery
A disaster recovery plan and alternate site solution is in place to restore Web Site service in what is usually a very short time period. This may well be a separate plan from the more traditional DRPs. It requires extremely fast response action since any outage of a web site is immediately known by all.
Your web site recovery plan is actually documented, alternate site contains redundant equipment, data is mirrored and a failover procedure can be implemented quickly.
10 Business Unit DRP's
Detailed disaster recovery plans are in place and have been tested for all critical business units.
All critical departments have a Team Leader identified and a set of response steps that have been tested.
11 Change Management
The DR Team is in the loop when all infrastructure (HW, SW, Net, etc) changes, upgrades, removals, etc. are planned.
You are part of the planning process and are nor 'Surprised' when changes are made.
12 Awareness Training
A regular program is in place, corporate-wide, to conduct disaster recovery awareness and crisis management training workshops.
Disaster Recovery and Crisis Management training are a part of the program just like training in technical skills, and drills have been conducted.
Jan Persson, CDP, has worked in the I.T. field since 1967. He began his disaster recovery involvement in 1980 and in 1985 started his own disaster recovery consulting practice, PERSSON ASSOCIATES. He has written and/or audited over 200 DR Plans, worked for and with the 3 major disaster recovery firms, conducts DR training seminars and workshops, and continues to take an active, hands-on, role in DR activities in all size shops and environments.




