Auditing Business Recovery Plans
- Published on Tuesday, October 30, 2007
- Written by Damom Arber
Will your Business Recovery Plan stand up to a thorough, well structured audit?
Most corporations have a Corporate Audit Department which is charged with also auditing these plans. But most such audits tend to consist of a request to see the hard copy of the plan, to check the most recent review date and the most recent test date. Even when the corporation is subject to audit by Federal and State Examiners these tend to look only for the same physical evidence.
Unfortunately most audits, internal and external, tend to be viewed as a nuisance at best, which is a shame, because a good audit program can be enormously helpful in identifying weaknesses in a plan, or areas that should be reviewed.
Is the current, generally cursory type of audit enough? Will such an audit determine whether the plan is sound, whether the corporation really would be protected if it relied wholly on the plan? I would suggest that such audits should be much more searching to be of real value.
The following questions are among those that should be asked by an auditor in determining the adequacy of the Business Recovery Plan and the process.
The Plan Manager
By Damom Arber
What experience does the Plan Manager have? Is the Plan Manager considered to be a professional, or is this a part time function? Is this a job that was given to someone because the corporation either didn't have another slot for the individual or because they 'had to have someone seen to do the job'?
Did the Plan Manager receive any training? Did he develop and write the plan? Does he actively participate in review and exercising as someone whose function needs to be restored, or does he manage these processes? In a plan for a small unit the Plan Manager may well manage all three aspects and be a participant in the recovery activities but in a large unit he should establish the criteria and have these done by the department.
In exercising the plan, he should establish the goals, the parameters and the success factors. He may well be the test manager but should not be actively engaged in the details of the recovery process.
Determination of Criticality
How was the plan developed? Was a Business Impact Analysis completed? Did completion of the BIA involve the department executive and management and at least a representation of the department staff?
How was the criticality of the functions that are performed by the unit determined? Was it based on the unit manager's decision that 'of course this is a critical function'? Or was it determined by using parameters established through use of one of the expert systems available?
If an expert system was used and there was a discrepancy between the unit manager's determination and that of the expert system how was this discrepancy resolved? Generally any responsible unit manager's thoughts on the criticality or otherwise of that unit's functions are pretty accurate, but should be objectively or externally confirmed.
Was a Critical Resources Analysis performed or, were the critical numbers and requirements plucked out of the air based on the perception of the person(s) determining the criticality of the functions. This is not to say that these numbers are necessarily incorrect, the people working in and managing the department generally have a good 'feel' for what is critical. The danger with relying on this is that too much may be considered 'critical' rather than too little.
A rule of thumb for people not professionally involved in plan development and maintenance seems to be 50%; that is, 50% of current resources would be required to restore the critical functions of the unit, (most people consider their own job at least to be critical).
In fact critical resources may be nearer 20% - or 70% in the context of immediate short term recovery, say the first 30 days. But this will only be determined if the CRA was done and done properly. Remember that if the unit considers that 50% of current resources is necessary, when in fact only 20% is required, the cost of maintaining these extra resources could be considerable, money that could be better put to use elsewhere. Conversely if 70% is actually required any recovery could be severely hampered due to lack of resources if only enough are available to restore 50%.
Siting of Recovery Facility
Does the siting of the recovery facility make sense, or is it just a convenience, or perhaps an inconvenience? If a disaster were to strike the current work site would the recovery site also be impacted because it is too close, or on the same power or communication grid?
On the other hand is the siting of the recovery site inconveniently far away merely because of a policy that states something like 'the recovery site must be at least 10 miles from the original site'. This sort of blanket statement makes sense in an environment subject to hurricanes, tornados, floods or earthquakes,but in those circumstances even 10 miles may be inadequate.
However, in a geographically stable location that is not subject to such extreme climatic disturbances such a blanket ruling may be unnecessarily broad. Most disasters that businesses face are of the nature of a fire or explosion or internal flood caused by triggering of sprinklers.
In the case of a fire or explosion there is every likelihood that the fire and police chiefs will put up a cordon going out a couple of or a few blocks at most. An internal flood causing evacuation of the building would not affect buildings in the immediate vicinity. It can be and is argued that the Chicago flood is evidence that a recovery site should be situated well outside the city limits, but I would suggest that that circumstance was an aberration. There are few cities outside the recognized earthquake zones and flood plains with the kind of situation and structure that would lead to their being subject to that sort of catastrophe.
The city with probably the greatest experience of coping with the kind of disasters that most businesses face is London, England, it having been hit with a number of terrorist, IRA bombings over the past several years. In the experience of the London Metropolitan Police damage is limited to an area of approximately a quarter mile radius. Any businesses outside that distance would be unaffected, except perhaps for power and communications. (The reason I mentioned above that a recovery site should be on different grids from that of the original site).
Notwithstanding the above comments there may well be a very practical reason why the recovery site is a hundred or a thousand miles from the original; another plant with additional capacity available for instance. But siting is something that should be objectively questioned.
Copies of Plan
How many copies of the plan are there, and where are they? Is there one for the whole department, perhaps locked in the Plan Manager's credenza? Do the rest of the staff have access to the plan - should they have access? Is the plan considered confidential, perhaps because it contains material of a sensitive nature? If so is it treated like a confidential document?
Is there an alternate to the Plan Manager, in case a disaster happens during the absence of the latter? Does this individual have a copy of the plan? Do both the Manager and alternate have copies offsite for the eventuality that they are unable to get into the original site? Is a copy maintained at the recovery site? Are all the available copies of the same vintage?
How are the copies controlled? Are they numbered and the numbers and location recorded? Is there a process for transferring the copies when the Plan Manager or Alternate change? Is it necessary that the copies should be so controlled?
Staff Training and Awareness
Have all those who would be directly affected in a recovery been made aware of the existence of a recovery plan, what their activities and functions would be in the recovery process? Are they kept up to date on amendments and changes and is there a process for ensuring this is done? It's quite surprising how often it's taken for granted that everybody affected knows what's going on and what would be expected of them, when in fact just the opposite is true.
Do the rest of the personnel know there is a plan and what they would be expected to do in the event of a disaster, even if it is only to go home and await further instructions? Are the staff provided with any kind of document on which basic information is recorded, e.g. emergency telephone numbers, address of recovery site? If so, is there a process for keeping this up to date?
Are the executive included in the training and awareness process, as an integral part of the plan rather than just to be seen to be on site?
Is there a forum and process for staff to question aspects of the plan or recorded recovery process, and to add their experience and expertise?
Off-Site Storage of Documentation
Is there an adequate procedure for off-site storage of data tapes and any documentation considered critical to a recovery? How frequently is the data sent off site, daily, weekly or less frequently? Is the frequency realistic, if it is weekly for instance how would the unit recover the information lost from the time of the most recent tape and the date of a disaster - which may be as much as six days later?
Is the off-site storage company professional, are their premises secure, are the tapes picked up in a secure container?
And probably the SINGLE BIGGEST CAUSE OF FAILURE in a recovery: HAVE THE BACK-UP TAPES BEEN TESTED? Tapes have a useful 'shelf life' of no more than several months, if they are continuously recycled over a period of a year or more it may well prove when the unit tries to access the information thought to be stored that it is in fact unrecoverable. Tapes more than, say, six months old should be replaced with new ones, dates of tape usage should be recorded and responsibility acknowledged.
Has the ability of the off-site storage company to respond in an emergency been tested? During a regular exercise the company is given notice as to when and where the tapes would be required and can position themselves accordingly.
But what would be their response to a call for service at 2:00 in the morning? Any responsible off-site storage company would be willing to acknowledge the need of the corporations for which they are providing the service to require that they can respond twenty four hours a day, and to test this level of service.
Is the unit dependent on other units from which it receives or to which it provides hard or soft copy, materials, work in progress etc.? If so do the other units have viable recovery plans which also acknowledge these same interdependencies? If the recovery sites of the affected units are not in the same building are there procedures built into both units' plans that will enable the interdependent processes to be restored? In fact if the recovery sites of the impacted units are in the same building are there similar procedures in the plans, remember the means of communication, delivery and transportation will have been affected by any disaster?
Emergency Response & Recovery Teams
Are the phone numbers of the civic, emergency response teams current? Do the police and fire departments have a copy of the floor plans of the unit's building? Do they have the phone numbers of the plan manager and alternate? The authorities may not want to keep these numbers on file because of the need for them to be maintained on a regular basis but they should have been approached.
Are the members of the various recovery teams aware of their responsibilities and functions, do they have a copy of an action plan? Are their phone numbers, business, home, emergency, cell, pager etc. on file and current? These should be randomly spot checked.
Testing and Exercising
Is there a written schedule for exercising the plan? Is the frequency adequate? Has the plan been exercised in accordance with this schedule? Are the exercises devised to determine the adequacy of the plan, or just to show to the executive and corporate audit that the plan has been tested?
Are the exercises comprehensive, i.e. are various parts of the plan exercised over a period of time or are the same sections exercised each time? If the exercises start 'small' do they increase in complexity? Are the exercises fully scheduled such that all staff know in advance what is being exercised, or are some of a 'surprise' nature? Do the exercises include the interdependencies?
Is there a written report made of the exercise, does it compare the results against pre-established goals and standards? If the report indicates one or more deficiencies in the plan are these evaluated as to the likely effect on the ability to recover? If the deficiencies are seen as serious has action been taken to correct them and has the plan been updated accordingly?
Are the critical staff rotated in the exercises so that a number get to take part over a period? Is the executive directly involved in the exercises? Are the recovery teams alerted and included?
Maintenance of the Plan
Is there a schedule for review and maintenance of the unit's plan? Is the frequency adequate? Does the schedule include provision for out of step review depending on major changes to the unit; function, processes, staffing, line of business, hardware/software requirements etc.? Has the plan been reviewed in accordance with the schedule? Who has reviewed the plan, was it the same level of staff who developed it? Does the review compare the plan with the original Business Impact Analysis? Once reviewed is it again signed off by the responsible executive? Once reviewed do current copies replace the older ones, ALL the older copies?
Does the Plan make Sense?
One unexpected advantage of an audit is that it is conducted by someone not directly involved with the development, maintenance and exercising of the plan. Such an individual has the opportunity to view the adequacy of the plan from the aspect of 'does it make sense'? Is the plan realistic, or just an exercise to show to anyone who shows interest that a plan exists, relying on the naivete of the questioner not to be able to determine the real adequacy and practicality of the plan and hoping that a crisis will never happen.
An audit well done is an invaluable tool in the whole business recovery planning process and should be used as such rather than seen as a nuisance.
Conversely to be of consequence such an audit should be well thought out and implemented, viewing the need for a realistic Business Recovery Plan as vital as the financial stability of the corporation.
No matter how financially sound a corporation may be if it cannot recover in good time following a disaster, through lack of a well developed Recovery Plan, it may well be faced with an inability to recover at all.
Damom Arber, MBCI, is the manager of contingency planning for Corporate and Treasury Divisions of the Bank of Montreal.
This article adapted from 10#1.