The plan, when drafted, will need to be reviewed and approved by senior management, who will have to agree with the assumptions on which it is based. The cooperation required from other sections will also need to be agreed to by senior management. It (the plan) will be a stepbystep approach for providing a smooth, quick restoration of services. Addressed will be interruptions ranging from short term to long term.
The detail of the plan will be such as to keep decision making to a minimum (i.e. who does what, when, and how).
The major portion of the plan will involve a significant contribution from the Data Processing area, but the total planning process must start and end in the senior management area.
Support for the plan from the appropriate levels is essential to ensure the production of and ongoing maintenance of the plan.
The first item to consider in the contingency plan is its coordination with the emergency preparedness plan and date security function. They must be considered in context of the risks and costs that are acceptable to management.
To this end we must define the objectives of a contingency and disaster plan:
Keep the bank solvent.
- Define who does what and how.
Keep the amount of decision making at a minimum in the event of a disaster. The issues that must be resolved and documented should include:
- What applications must be processed?
- What are employee responsibilities?
- What equipment is needed to process applications? - What supplies are required to support applications? - What will be relocated after a disaster?
- What will be done about unprocessed work?
- User instructions?
Establishment of the
Contingency Planning Function
Developing the plan will require time, patience and a firm commitment from all levels of management. In the context of this paper I have allowed that senior management are fully aware of the exposures in the current computer dependent environment, and that full support for the project will be forthcoming. Management's primary role is acceptance of the need for contingency planning, selection of an appropriate place for the contingency planing function, and recognition of the benefits that can be expected from establishing a contingency plan.
The first step is the issuing of a statement of policy, which will indicate managements commitment. Such a policy statement might be 'Each operating and staff function is responsible for the computer based data collection, and processing system on which Bank operations depend. Since these systems exist for the bank functions which they support, the managers of these operations must understand that the continuity of these systems is of greater importance to them than it is to data processing management.'
The policy statement should be used as a model for senior management in defining its expectations and support. To ensure that the expectations are being met, a steering committee with representatives from all areas should be formed. The committee should make recommendations to management and provide guidance on policy matters.
The contingency planning function may be established with a team leader and a small group of assistants. Once the plan is established, the team leader may become a contingency planning coordinator reporting to data processing management. It should be noted that continuing management involvement will be needed to keep the plan current, and workable.
the Contingency Plan.
A contingency plan cannot be effectively developed by one person. The coordinator must have a dependable team and the support of the executive if the plan is to be successful. To assist with this, the team members must be educated about contingency planning. It must be emphasized that in the event of a disaster that 'business as usual' will not happen. The Bank is going to lose time and money during the recovery, the aim of the contingency plan being to minimize the losses.
Included in the plan should be:
- Names, addresses, and telephone numbers of key personnel.
- Recovery objectives and responsibilities.
- Offsite checklists of required resources, including hardware, software, communications, data, documents, office equipment, documentation, and staffing requirements. Supporting information, such as maps, transportation routes, locations etc.
- Procedures detailing mobilization, restoration and reconstruction activities.
- The administrative process for recovery coordination.
- Procedures for continued maintenance and testing of the plan itself.
- A contingency plan distribution list.
A quarterly review of the plan should be carried out to maintain the accuracy of the names, addresses, and telephone numbers of the key personnel. Similarly hardware and software requirements should be evaluated at the existing and alternate site on the same basis.
The coordinator is responsible for maintaining, testing, updating, and distributing the contingency plan. With regard to the testing, it can be achieved in two stages stage one is the piecemeal approach testing one component at a time, stage two is the simulated disaster whereby the whole plan is tested.
A factor which should not be overlooked at this stage, is that of cost, which is not insignificant. The development and maintenance of such a plan is a cost which must not be overlooked.
Management must recognize that unfavorable events will occur and result in losses. This recognition must be followed by a commitment of dollars to implement safeguards that will minimize the impact of such losses once they have occurred. This cost is offset somewhat by the fact that being able to assure a large commercial customer that a contingency plan is in place and being maintained is good marketing strategy. It is argued that the cost of this function should not usually exceed 1% of total data processing costs, but this must be looked at in the light of the bank's policy.
Note: For example of a contingency planning procedure see Appendix 'A' (on next page).
The assessment should:
- result in a list of critical applications in order of priority.
- result in a list of supporting data processing resources in order of priority.
- identify the potential threats and estimate the probability of their occurrence.
- estimate in money terms the losses that might result from the above.
Some specific threats that may occur are:
- Air conditioning failure
- Electrical power failure
- Equipment failure
- Telecommunication failure
- Water damage
- Civil disorder
- Vandalism or sabotage, Theft, Picketing or other
- Accidents that damage building, equipment &/or supplies
The plan should set out specific actions to be taken in each of the circumstances.
While it is difficult to place a cost figure on the exposure involved, some of the probable loss areas are:
- Increased operating costs
- Loss of customers
- Loss of assets
- Bad media publicity
- Loss of profit
- Loss of good will
- Loss of competitive edge
- Legal/regulatory requirements
- Application priorities
Each application should have a priority set so that in the event of limited resources, the order in which they are to be processed is already determined.
Such a list of priorities might be:
- Priority 1 Jobs that must be run according to schedules.
- Priority 2 Jobs that can be run as time and resources permit.
- Priority 3 Jobs that will not be run in the event of a disaster.
Determining the priorities will ultimately rest with management having regard to the advice of the steering committee. When these priorities have been identified, the resources needed for these jobs must be identified as:
These having been determined, should then be included in the contingency plan.
- Operating Consideration
- The resources required for processing should be documented i.e.
- Hardware configuration
- System configuration
- Teleprocessing network
- Software components, including system, communications and application
- Documentation (program, operations, and user)
- Forms, supplies and general office needs
- Staffing requirements
Any future changes should not be overlooked when collating this information
This important part of the plan must not be overlooked. A site should be selected that is safe and remote from the current center. Some of the resources that should be considered when documenting backup requirements are:
- Systems, program and operating system documentation
- Program source and object code
- Procedure libraries
- Operating system library
- Master files
- Transaction files
- Adequate supply of all input/output forms
- Contingency plan manual
- Hardware inventory list
- Software Backup
When software packages have been purchased, where possible an agreement to obtain source coding should be negotiated to cover the contingency that the vendor "goes out of business."
Duration of problem
Another important factor is the length of the 'down time'. A series of variations to the contingency plan must be formulated to cover the estimated time that the system will be out of service. These variations would fall into three basic categories:
- short (up to 6 hrs)
- intermediate (6 to 24 hours)
- long (over 24 hours)
Note: These categories are for example only and a management decision would have to be made to define the time periods.
Normally a short interruption would not require a move to an alternate site, but would require the rescheduling of human resources, transport etc. It may however be difficult to estimate the 'down time' and it could be prudent to begin alternate site preparation, which could be followed through if required.
In this regard, where doubt exists, management must monitor the situation, reassess their position and institute a higher level of contingency plan when necessary. The earlier the warning of further change will make the transition to the next level work more smoothly and remove some of the pressure from those implementing it.
Restoration of Processing Center
Just as a lot of work and planning must go into the implementation of the contingency plan, a similar effort must be put into the move back to the original processing centre. It must be as orderly and planned as the rest of the contingency plan. To some extent the contingency plan will be executed twice, once to move to the alternate site, and again to move back to the original site.
Each contingency plan will vary according to the environment in which the data processing system is located, and the technical resources available. Notwithstanding the above, all matters discussed in this paper should be covered in the plan. A sample plan (Appendix B) is attached for assistance in compiling the contingency plan
A sample Contingency Planning Procedure for Disaster Recovery
Detection and recovery from disasters involves several stages. Each requires certain actions.
Whoever first detects an 'emergency or disaster' should report it immediately to the senior person in charge, such as a supervisor or manager, or notify the security guard.
Stage Name Action(s)
A Sample Contingency Planning Manual from a Bank Data Processing Service Center
- Offsite Retrieval.
- Operating system and backup files containing necessary libraries, e.g. loadlib, proclib etc.....
- Current backup files, e.g. DBxxx, JRNIC etc.... Package libraries, e.g. MIDAS libraries .....
- Documentation manuals, e.g. procedures.... Scratch tapes..
- Carriage control tapes..
- Standard and preprinted stationary and forms..
- Blank/preprinted form tape labels.
- Application process checkoff sheets..
- Vendor operating manuals e.g. utilities, OCL refer- ences, system reference manuals etc....
- Move to backup site
- Backup site Initialization.
- Allocate system catalog to assigned disk pack..
- Connect system catalog to master catalog/define an alias..
- Define an alias and catalog appropriate pointers..
- Create and build an index for entries on assigned disk pack..
- Catalog input files to be used in processing..
- Allocate and restore necessary operating system libraries.
- Allocate and restore application data sets for processing.
- Process Priority applications.
- Maintain checklist.
- Document Processing.
- Errors encountered..
- Recommended changes..
- System compatibility.
- Secure Backup Site.
- Scratch disk data sets..
- Uncatalog data sets..
- Inventory owned tapes, supplies, forms and so forth.
- Return to Control Center
- Follow-up Procedure.
- Record changes required during processing..
- Forward documented changes to administrator..
- Record expenses..
- Return files and/or equipment to proper storage.
- Notwithstanding the above, all matters discussed in this paper should be covered in the plan.
Alan J. Lyons is vice president of IDOM Inc.