Disaster Recovery Planning: More than Boom and Gloom
- Published on October 29, 2007
If you are in the introductory stages of being a Disaster Recovery Planner take heart and be strong, the light in the tunnel may not be a train. If you have been in disaster recovery planning for a moderate time part of this article may not be new news, but perhaps I may bring a different point of view. My years of involvement with disaster recovery planning and contingency planning have shown me that there is more than being ready for the Boom and managing the Gloom. The Disaster Recovery Planner can be a cornerstone to improving the total business.
Besides the standard reasons, legal requirements, customer opinion, competitive edge, responsibility to stock holders and employees, and those other frequently touted words, why bother with disaster recovery planning? Why get into all that writing, organizing, testing, educating, and maintaining? After all, it takes a lot of work and effort. As a consultant I use these reasons to persuade a client to hire me. Disaster recovery and contingency planning are not just for big business. It is not just for data centers or networks. Every business including personal business can benefit from the reasons often not considered.
The basic elements preceding and supporting recovery preparedness make good and economic business sense. Usually with less start-up effort that you may think Disaster Recovery Planning can improve the business, reduce recurring problems, and through reduced downtimes and better managed process should, pay for itself.
In some cases the requirements or motivation to build the disciplines of disaster recovery planning have a disaster as their origin In most cases disaster recovery planning starts with Executive commitment. Followed in my list by:
The right person to drive the process. (This may be the hardest part of long term planning.)
- Commitment to startup costs.
- Commitment to maintenance costs.
- Commitment to test cost.
- Commitment to changing the way the business operates.
- Endurance to make the plan cost effective enough to pay for itself.
With this as a start you could begin to write a disaster recovery plan or with a little extra improve the total business and write a disaster recovery plan.
How can disaster recovery planning work this magic? For now skip the legal, customer opinion, competitive edge, responsibility to stock holders and employees and the common reasons for boom and gloom planning. Without the discussion of Business Impact Analysis, Risk Analysis, Critical Applications, Prioritization's, the basic answer is at the daily operating level for the business. When I meet with a business to develop a plan from scratch or rejuvenate an existing plan, I start by stating that Disaster Recovery Planning should improve the business. This is a brief venture into how I begin improving their business
Procedures and Process
I want to see, read, have or know where, the desk procedures, operations manuals, support procedures, personnel listings, organization charts, etc., call it what you want, where are they and are they used. Basically each individual job should have some outline, written, so that someone other than the primary person doing a job can perform the work, even if minimally. The documentation does not have to be so detailed that a non-skilled person can follow them. Details need only to be at a level such that a knowledgeable person could make use of them.
Organization charts and mission statements usually indicate process ownership and support structures. Continue gathering building plans, hardware lists, software lists, vendor lists, personnel lists, phone lists, addresses, etc. When you find any item that is not current get responsibility assigned to ensure that it gets corrected and continues to stay current. Accuracy and ownership is key to a good Recovery Plan.
This basic gathering should include architectural plans for the building, plumbing layouts, power distribution, HVAC, etc. For an example the cabling of the hardware in the computer room should be charted in the event a cable is pulled or emergency re pathing is required due to a device failure. The cabling diagrams can also be used to help reconstruct the computer room and to identify potential problems.
By this time you have a better understanding of the business. You know who has written procedures and who needs help. Where written process is in need of improvement I add some of the time needed for the updates or creation to the project and work with the organization management to have staff assigned to complete the procedures. To this point the size of the business does not matter. Always keep in mind that the language used must be plain and clear so that a person who may be familiar with the operation, but not necessarily highly skilled in the exact task can follow the instructions. It is most important that more than one person knows what a given task is and through written words, flowcharts or some media, can perform that task to accomplish the same result.
For a small business the collection may be only a few pages. An office inventory, customer lists, supplier lists, and similar material. For a large business there may be hundreds of pages. Not all needed for the Disaster Recovery Plan but needed by each department for normal business operations and for the total business recovery plan should one be developed.
Written procedures improves the business by allowing distribution of essential and critical tasks at any moment of time, with minimal delay. Shared knowledge of the business, the facility, mission, and customers, result in lower operating costs through better use of shared resources and possible elimination of unnecessary duplicate resource costs.
For the disaster recovery planner the written process fulfills the requirement for alternate facility processing ability using replacement staff if needed. The planner will obtain the inventories and configurations needed to build alternate facilities and be able to reconstruct a lost or damaged primary facility.
If you have gone through a complete Business Impact Analysis (BIA) or Risk Assessment some where in the process the question of what is the most likely cause of downtime will be found. The list of causes may range from equipment failure to catastrophic loss is long and sometimes location and type of business dependent. This is where the worth of BIA becomes evident. To be complete the Risk Analysis (RA) helps determine where specific focus belongs.
The results of an honest business review indicates that the most frequent impact to a business is not nature or hardware. The cause for most downtime, downtime being a few second OOPS to a total disaster, is Human Errors and Omissions. We often do not like to admit this because earthquakes and hurricanes are easier to blame than a person. Human errors, unlike hurricanes and earthquakes occur frequently and repeatedly.
Removal of all human error is not possible, although recurring error reduction is possible. Just as multiple pathing in a network or cabling in a computer room can reduce the number of outages and the length of the outage. Monitoring mistakes can reduce the number of occurrences. Following a lengthy procedure is more likely to result in an error than a short procedure.
As a business, the control of interruptions is a discipline. Separate from Disaster Recovery but similar in nature is daily recovery management where avoidance and rapid recovery is the goal. Through monitoring controls placed on the business planned changes are scheduled and backup process is kept at hand. You would not drain the oil from your car then drive it to the store to buy more. The same should hold true for a business change. A software application upgrade should be scheduled not to interrupt critical or production time and should have a means to remove the changes if it does not work in the allotted change time.
Problems need to be written down and tracked for resolution. Once resolved the findings should be made known so all who need to know can make adjustments. If not the same problem may recur many times. Each time is a loss to the business. If a person makes the same error, unknowingly, unless the cause is tracked they will not be able to end the cycle. If a piece of hardware is repeatedly listed perhaps it should be replaced. Without tracking problems they will continue causing downtime. Identification of a problem is necessary to its solution.
If you need to present proof that even short interruptions are costly, take the number of people affected multiplied by salary and overhead multiplied by the duration. This will give the cost to the business in man hour dollars. This little formula does not take into account lost revenue, from sales or production, etc., only labor cost, but the numbers rise very quickly.
Procedures and copies of documentation improve the business through workload flexibility and improved accuracy. The DR planner incorporates these critical procedures in the Disaster Recovery Plan and is able to add processes as needed.
Tracking problems and managing change improves the business by reducing unscheduled interruptions that can build into a catastrophic interruption. Shared knowledge of the problems that routinely face the business lead to rapid solutions. By reducing interruptions the cost of business goes down and the risk of disaster follows. The business may also be able to qualify for lower insurance rates.
For the disaster recovery planner, controlling interruptions, fulfills the requirement for mitigation of some causes of Disaster Declaration. The planner will need to be a driving force in the total process. With improved procedures and lowered daily risks the disaster recovery plan can become a focused working program. A well-documented business is also on the way to ISO 9000 quality improvement.
Remember: it takes a dedicated planner to get the stone rolling but the result is the disciplines of shared knowledge, process flexibility, accurate documentation, problem tracking and scheduling. Combined into a better disaster recovery or business recovery process that is paid for with improvement. The Boom and Gloom may not be gone forever, but you, the planner, can be the Knight who fends off the dragon and saves the kingdom.
William Million, CDRP, has been in the industry for 12 years; currently he is a corporate consutant for SCT, Corp.