| DRJ's Sample Plan |
| By Web Editor | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| February 13, 2008 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
INTRODUCTION
MIS CONTINGENCY PLAN
MIS director’s home If you maintain your MIS Contingency Plan on a personal computer using some type of a word processing package, you should back up your Plan after it is entered. After each time it is updated, keep a copy of the diskette stored in a secure location offsite. Having copies of the Plan at various residences may be thought of as unnecessary and redundant since the diskettes are also stored offsite, but, should a disaster ever happen, your rime can be better utilized in disaster recovery than in locating a PC, printing multiple sets of the plan, and separating the continuous paper into usable documents for distribution.
Modified by:___________________________________________________________________________________ ________/____/_____ Reviewed by:__________________________________________________________________________________ ________/____/_____ Reviewed by:__________________________________________________________________________________ ________/____/_____ Approved by:__________________________________________________________________________________ ________/____/_____ Approved by:__________________________________________________________________________________ ________/____/_____ Organizations are becoming more dependent on the service and record-keeping of the data processing department. The overall objectives of the MIS Contingency Plan are to protect corporate resources and employees, to safeguard the organization’s vital records of which the data center has become the custodian, and to guarantee the continued availability of essential MIS services. The role of this Plan in these objectives is to document the preagreed decisions and to design and implement a sufficient set of procedures for responding to a disaster that involves the data center and its services. A disaster is defined as the occurrence of any event that causes a significant disruption in MIS capabilities. The central theme of the Plan is to minimize the effect a disaster will have upon on-going operations. This Plan responds to the most severe disaster, the kind that requires moving off site to a backup facility. Occurrences of a less severe nature are controlled at the appropriate management level as a part of the total Plan. The basic approach, general assumptions, and sequence of events that need to be followed will be stated in the Plan. It will outline specific preparations prior to a disaster and emergency procedures immediately after a disaster. The Plan is your roadmap from disaster to recovery. As you follow it, you may choose at any time to take detours for various management reasons. However after the detour, you get back on the main road to recovery. The Plan will be distributed to all key personnel, and they will receive periodic updates. The general approach is to make the plan as threat-independent as possible. This means that it should be functional regardless of what type of disaster occurs. In order to limit your loss, it must provide for the logical restoring of all critical systems to a production status within 24 hours after the equipment is operational at either the home location or a backup site. By performing a risk/impact analysis, you can determine your potential dollar loss that will result from a major disaster. For the recovery process to be effective, we have organized the Plan around the team concept. Each team has specific duties and responsibilities once the decision is made to invoke the disaster recovery mode. The captains of each team and their alternates are key MIS personnel. The Plan contains the phone numbers of the team members and represents a dynamic process that is kept up-to-date through updates, testing, and reviews. As recommendations are completed or as new areas of concern are recognized, the Plan will be updated reflecting the current status. No matter how many precautions are implemented and to what extent they are enforced, most people in the data processing held agree that there are no completely secure computers. The operations of a data center could be suddenly disrupted by events we have little or no control over, involving people, mechanics, electronics, or natural disasters. It is important that you realize the exposure to your organization chat the loss of your data center would be, and that you take steps to minimize the costs resulting from loss or damage to its resources and capabilities, and the costs to the departments and customers it serves by their losses or a reduction in computer services. The computer room is the heart of a data center. Any threat in or near the computer room can affect the critical flow of information from this nerve center. The location of the disaster could be more important than the amount of damage it causes. A small problem at a critical location could cripple a data center and require it to reestablish operations at a backup facility. This Plan assumes that a catastrophic event has severely crippled the data center, forcing it to reestablish full operations at a fully equipped backup facility. As soon as hardware can be installed in a cold site, processing will be moved from the backup facility to the less costly cold site. All applications will eventually be processed at the backup location, even those not classified as critical. Concurrent with the backup facility processing is the reconstruction of the original or alternate permanent facility, and the planning for the final move back to this site. Although this Plan follows the assumption of a catastrophic disaster, the Plan can be quickly altered to handle a less severe emergency as determined by management. Two types of backup facilities are available, a cold site (shell) and a hot backup site. A cold site is an empty computer room normally equipped with raised floor, air-conditioning, electric power, and fire protection that is ready for the installation of computer hardware. The problems in relying only on a cold site for your backup are usually locating and installing the hardware. This generally takes the most time. A hot backup site, on the contrary, is a fully equipped and operational computer center, ready to be used by its members in the event of an emergency. The computer room is normally equipped with up-to-date hardware. The member schedules hands-on resting to process its applications on the backup hardware to ensure the validity of its Plan. Many commerical hotsites offer many Disaster Recovery Services . It provides a number of scheduled testing periods each year as part of its contracted services agreement. These testing periods help keep the member’s Plan up-to-date by assuring that its applications process correctly at the backup facility. Many companies subscribe to both a hot backup site and a cold site. If the two backup facilities are adjacent to each other, it will enhance the recovery process. However, it is not necessary that both sites be at the same location. While emergency processing would be started and continued for a short time at the hot backup site, replacement hardware would be configured and delivered to the cold site. When the cold site becomes operational, processing would be switched from the hot backup site to the cold site. As soon as the member’s permanent facility is available, the hardware would be moved from the operational cold site to the member’s own facility. As part of the initial start-up at the backup site, those systems defined as “critical” batch processing would be run first. Testing would be expanded until total recovery per your Plan is achieved. By taking advantage of both the hot backup site and the cold site as outlined in the guidelines of the Plan, MIS personnel can restore production operations at their own facility in the shortest possible time following a catastrophic event. INSERT HERE A BRIEF DESCRIPTION OF YOUR DATA CENTER. INCLUDE A GENERAL DESCRIPTION OF THE KIND OF PROCESSING AND SUPPORT MAINTAINED AT YOUR MAIN DATA CENTER AND ANY PROCESSING AND SUPPORT AT ANY OF YOUR OUT- SIDE LOCATIONS. SECTION TITLE I. CONTINGENCY PLAN FOR MAJOR DISASTERS A. Detection and Reaction
II. DISASTER RECOVERY TEAMS
If a problem is detected concerning the computer room environment, such as electrical, water damage, excessive heat, cold, or humidity, contact the following authorities:
After contacting one of the above, make the decision as to who should be contacted based on the problem encountered: air-conditioning, water exposure, or electrical.
If you are aware that an unauthorized person is in a secured area of the computer complex, notify one of the following:
Following the procedures below will help to reduce the organization’s exposure to additional losses because of actions not taken by on-site personnel. These actions are targeted at emergencies concerning air-conditioning, fire, or electrical or water damage. a. Air-conditioner failure Normallv a graphic temperature-and-humiditv monitor is located in the computer room and operates 24 hours a dav. The temperature should be checked each morning and periodically if a heat increase is noticed. The watchman must check the temperature on each round he makes. If the computer room has two air-conditioning units, the fans in both units will normally operate at all times to maintain the proper air now. If one unit fails, the second unit can usually carry the load for most processing, but only for a limited amount of time. The failing unit needs to be repaired as soon as possible to take the strain off the only operating unit. The normal temperature for a computer room is between GS and 76 degrees. If the temperature rises above 76 degrees, take the following precautions: 1. Advise the Operations Supervisor that the temperature is above the normal operating range. He will notify the Maintenance Department for corrective action and then notify the Operations Manager. 2. If the temperature rises above 84 degrees, the hardware vendor must be notified. The Operations Supervisor must decide which noncritical applications may continue to be processed. If the Operations Manager decides to power down the computer or if the computer powers down by itself because of excessive heat, it should not be powered up until approval has been received from the hardware vendor. Maintenance Department personnel normally perform maintenance for the air-conditioning units each month. They will clean the fiiters, check the freon, check the belts and hoses, and do normal visual inspections. If any problems occur between scheduled maintenance operations, the Maintenance Department should be notified. b. Fire alarm procedures Various fire alarm systems can detect and suppress fires within the first few seconds. They can auto matically close fire doors, control elevators, shut off equipment and circulation fans, and notify the local fire department through 24-hour, central-station monitoring. If automatic controls are inappro priate or nonexistent, local smoke alarms should alert you, and fire extinguishers can be used. Water sprinkler systems are activated depending on the type of controls installed. Some sprinkler svstems are dry systems, controlled by smoke/heat detectors that will open valves allowing water to flow to a whole series of sprinkler heads. Other svstems are wet svstems, with water alwavs in the pipes. In a wet svstem, each sprinkler head is equipped with a link that will melt and fall aside when exposed to high temperatures, activating that particular sprinkler head. The wet system helps to limit water damage to lust those areas where the fire occurs. Should tire or smoke be detected in the computer room, do the following: 1. Immediately power-down the computer. 2. Take the hand fire extinguisher and attempt to put out the fire. NOTE: A fire extinguisher should be mounted on the wall next to the door of the computer room. 3. If unable to extinguish fire: -Pull the fire alarm or call the Fire Department at_____--____________________(emergency___________).
4. If time permits: -Remove current tapes from computer room to a safe place.
5. Notify the MIS Emergency Management Team.
c. Eledricalltailure procedures
a. Power down thecomputer. b. Push the Emergency Power Off switch next to the door of the computer room. c. Cover all hardware with plastic covers stored in the computer room. d. Call the security guard text ) and tell him that the sprinkler is discharging and no fire is apparent. Have him turn off the sprinkler system and notify the necessary maintenance people. e. Make sure that the vault is secured. f. Contact the Operations Supervisor, who will call the Operations Manager, the Maintenance Department, and the hardware vendor. The hardware vendor must inspect the equipment for water damage before it is powered up again.
a. Powerdown the computer. b. Push the Emgrgency Power Off button to stop electric power to the computer rcom. e. Place plastic covers over all equipment if water is coming from above. d. Close and lock the vault door. e. Notify the Operations Supervisor, who will call the Operations Manager and Maintenance Su- pervisor. The Maintenance Department will determine the source of the water and take cor- rective action. f The Operations Supervisor will contact the hardware vendor to have the equipment checked for damage before the equipment is powered up.
The team will personally visit the site and make an initial determination of the extent of the damage. Based on their assessment, all or part of the MIS Contingency Plan will be initiated. The team will decide:
a. If the computer operation can be continued at the site and repairs can be started as soon as possible. b. If the computer operation can be continued or restarted with the assistance of only certain recovery teams. c. If a limited computer operation can be continued at the site and plans started to repair or replace unusable equipment. d. If the computer center is destroyed to the extent that the backup recovery facility must be used and the full MIS Contingency Plan initiated.
- Coordinate initial response using office procedures to protect life and minimize property damage. -Assess the damage. - Determine extent to which MIS Contingency Plan will be utilized. ..Minor Damage—Processing can be restarted in a short time with no special recall of personnel. Anticipated downtime is less than one day. Damage could be to hardware, software, mechanical equipment, electrical equipment, or the facility...Major Damage— Selected teams will be called to direct restoration of normal operations at current site. Estimated downtime is two to six days. Major damage to hardware or facility...Catastrophe— Damage is extensive. Restoration will take upwards from one week. Computer room or facility could be completely destroyed. All team leaders will be called to begin a total implementation of the MIS Contingency Plan.— Notify senior management. — Notify users. — Prepare regular status reports for senior management. — Notify users of projected time for becoming operational. B. Initiation of Backup Site Procedures 1. Emergency Management Team notifies other teams Following an emergency at the computer center, the operational personnel on site will take the appropriate initial action and then contact a member of the Emergency Management Team starting with the first name on the list (Form 12). When a member is located, that member will contact the remaining members of the Emergency Management Team. The members will meet at or near the disaster to make a firsthand assessment of the damage. They will determine the action to take and will notify senior management. If a determination is made to notify all other teams, the Emergency Management Team will phone the other teams using a predefined pyramid contact system. A brief message will be dictated over the phone and the called person will write down the message. At the end of the message, the called person will read back the message to verify that all critical information is stated. This same procedure will be used for all calls in the pyramid. It will ensure that all contacts have the same information. Form 12 contains the names of all team members and their phone numbers. 2. Establish Control Center The first task for the Emergency Management Team is to establish a control center. The location of the control center should be in close proximity to the data center. It can be in another office or another building in your complex. If nothing is available or usable, a nearby hotel/motel affords excellent accommodations. The Control Center should be close to other departments in your organization for maximum communication during this hectic period. The phone number should be made available to all departments and users so that all information can be channeled through the center. If a location is not available in your own facility, the first choice in public facilities is_________________________ __________________________________________________________________________________. An alternate is______________________________________________________________________. 3. Begin Disaster Recovery Team operations and Disaster Recovery Logs The captain of each Disaster Recovery Team will document the team’s activity by posting it on the Disaster Recovery Log (Form 18). This will be used by the Management Team to prepare status reports for management and will become a historical document for your organization. The Management Team will also use the log to coordinate the concurrent activities of the various teams. The Disaster Recovery Log (Form 18) will be used by all teams. Form 12
Form 12
C. Establishment of Full Recovery at backup Site 1. All planned software, hardware, and resources in place at backup site, and the applications tested. 2. Communications network and other equipment fully operational. Make arrangements with the telephone company and other communications vendors for delivery and installation of temporary equipment. Vendors that specialize in used equipment can deliver their equipment in a very short time. Conduct a complete series of tests to ensure full recovery of the communication network capabilities. Provide for full restoration of service at the original or new alternate facility. 3. Disaster Recovery Team checklists The checklists below (Form 21) are to be used by each team captain to keep track of the many activities that will be performed simultaneously by his team. The Management Team will collect these checklists and prepare a detailed report of daily progress. The lists will also be used to coordinate all events from the Control Center. Disaster Recovery Checklist Form 21 TEAM: MANAGEMENT Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: MANAGEMENT (page 2)...... Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: OPERATIONS...... Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: DATA ENTER AND CONTROL Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: SPECIAL PROJECTS Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: TECHNICAL SUPPORT Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: DATABASE Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: SYSTEMS AND PROGRAMMING Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
. Disaster Recovery Checklist Form 21 TEAM: INSURANCE DEPARTMENT Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
Disaster Recovery Checklist Form 21 TEAM: INTERNAL AUDIT DEPARTMENT Date:____/_____/______ Team Captain:______________________________________________ Alternate:__________________________________________________
D. Restoration of Facilities and Operations at the Original and/ or Alternate Site Now that your backup facility is functioning as your data center, it is time to turn your attention to rebuilding your permanent data center. Reconstruction plans should already have been in progress, but now it is time to devote more effort to this area. The full reconstruction is normally a two-step process. The first step is to use a cold site as a replacement of the high-cost, hot, backup site. The permanent replacement hardware that will eventually be used at your permanent facility is ordered and installed at the cold site. Once it is tested and operational, the production processing is moved from the hot backup site to the now-operational cold site. A lot of hotsites also provides both a hot backup site and a cold site. Reconstruction at your permanent facility may not require a totally new building but only repair of the existing facility. Once the permanent facility is ready for use, the hardware at the operational cold site can be moved to the permanent facility. SECTION TITLE II. DISASTER RECOVERY TEAMS II. DISASTER RECOVERY TEAMS A. MIS Organizational Chart Except for the Insurance Department and the Internal Audit Department teams, all of the team members belong to the MIS Department. Therefore, it is important to note the organizational responsibilities and authority the remaining members have. Your organizational chart will identify the normal chain of command. YOUR MIS ORGANIZATIONAL CHART SHOULD BE PLACED HERE B. Description and Responsibilities 1. Disaster Planning Coordinator _____________________________has been given the responsibility of Disaster Planning Coordinator and will coordinate the activities stated in this Plan. As the Plan is being formulated, he will be responsible for accumulating all of the information that needs to be included. Most of the data already exists in the organization but is not in a useful form. As people are assigned their duties, their names, addresses, and phone numbers have to be entered in the various parts of the Plan, As the addresses and phone numbers change, the master copy of the Plan is updated, and, every six months to a year, updates are distributed. After the original version of the Plan is completed, copies are made and distributed to team captains and their alternates. Copies are also secured at all locations named in the Plan. The copies are to be kept at the employees’ homes and not in their desks at the office. If a major disaster occurs, their offices may be destroyed. All activities in the Plan need to be tested. This not only ensures that the procedures work, but also acts as a training exercise for the various teams. The Coordinator will schedule testing and document the success or failure. He will prepare reports for management and for the Internal Audit Department. When tests fail, he will work with the team captain to resolve the problem and schedule another test. Numerous seminars and meetings on disaster recovery are available, and the Disaster Planning Co ordinator represents his organization at these meetings. He will stay current with state-of-the-art information and procedures and will present this information to his organization. As hardware, software, and communications are updated at his facility, he will communicate with the hot backup site to ensure that it can adequately support all critical systems. 2. Emergency Management Team Team Captain: MANAGER SYSTEMS AND PROGRAMMING—________________________________________ Alternate:______________________________________________________________ Responsibilities: —Supervise the initial reaction to the disaster and ensure that organizational property and lives are secured. — Make an assessment of the damage. — Determine to what extent the MIS Contingency Plan will be implemented. — Notify senior management. — Call team captains and begin the disaster recovery plan. Team Members: — Manager of Computer Operations—______________________________________________ — Database Administrator—______________________________________________________ — Manager of Technical Support—_________________________________________________ — Manager of Special Projects—__________________________________________________ — Disaster Planning Coordinator—_________________________________________________ Disaster Recovery· Functions: — Set up a Control Center per the instructions of the Plan so that all operations will be channeled through one area. — Distribute the phone number to all teams and emphasize the use of the phone only for necessary information. — Notify the backup facility of your intentions to use it. — Start using the Disaster Recovery Logs for all operations. — Supply senior management with scheduled updates on status. — Notify all users of the status of the computer facility. — Arrange for any additional professional help. — Coordinate interviews to ~ill any vacancies. 3. Operations Team Team Captain: MANAGER COMPUTER OPERATIONS—________________________________________ Alternate:____________________________________________________ Responsibilities: — Restore files and operate system at the hot backup site. — Prepare operations schedule at hot backup site. — Coordinate activities necessary to restore facility at the existing or new permanent location. — Order and install computer hardware necessary for normal processing at permanent location. Team Members: — Operations Supervisor—_____________________________________________ — Computer Operators—As assigned — Tape Librarian Disaster Recovery Functions: a. Computer operations — Operate or give assistance to computer operator at hot backup site. — Obtain backup tapes and restore files at hot backup site. — Determine restart point for critical systems. — Test critical systems for production processing. — Establish an operations schedule at hot backup site. — Inform users of processing schedule at hot backup site. —Arrange for shipment of backup supplies to hot backup site. — Arrange for shipment of off-site tapes to hot backup site. b. Facility preparation — Coordinate the repair or construction of the new permanent facility at the original location or new location. c. Replacement hardware —Contact hardware vendor to determine if current hardware is repairable. If hardware must be replaced, get proposed time-frame for delivery. If time-frame is not satisfactory, get proposal from used-hardware vendor. — Check on requirements for cables, connectors, and other start-up requirements. —Arrange for procuring any other data-handling equipment. — Schedule testing with maintenance personnel. d. Cold site preparation — Review cold-site facility to ensure that the environment can support the hardware that will be temporarily operating there. — Provide for adequate power, cables, and connectors. — Provide for communications requirements. — Provide for security guards and limited access to computer room. — Provide for off-site storage. e. Computer support equipment. — Determine the need for other support equipment: PC, data-entry office equipment, paper-handling equipment. Order all required equipment. f. Supplies. — Review list of requirements. — Contact vendors on Emergency Vendor list. —Arrange for shipment of existing supplies or purchase of replacement supplies. — Notify remaining vendors of disaster and give shipping address of backup site. 4. Data Entry and Control Team Team Captain: MANAGER I/O CONTROL__________________________________________ Alternate:_______________________________________________________ Responsibilities : — Restore the Data Entry and I/O Control operations at the backup facility or at an alternate site. Team Members: — Data Entry Supervisor—______________________________________________ — I/O Control Supervisor—______________________________________________ — Data Entry Personnel—As assigned — I/O Control Personnel—As assigned a. Data input —Work with Operations to determine the starting point for recovery. Match the latest backup files that Operations will use to restore, with the data from the users. Obtain original documents needed to rekey to bring files up to current status. — Determine when equipment will be available for keying, or make arrangements with service bureaus or other outside services for keying. b. Data control — Notify users of the disaster and advise them of the temporary procedures for handling input and output. — Arrange for temporary office space either at backup facility or somewhere close to original facility. — Obtain the backup documents and establish revised schedules in conjunction with users and Operations. 5. Special Projects Team Team Captain: MANAGER SPECIAL PROJECTS—________________________________________ Alternate:_________________________________________________ Responsibilities: — Provide transportation for people, supplies, and equipment between sites. — Provide administrative and office support for the recovery activity. Team Members : — Special Projects personnel Disaster Recovery Functions: a. Transportation to/from backup facilities — Provide for transportation to and from backup facility. Set up a shuttle service between the airport and the backup site. —Arrange for shipments of material, supplies, and computer equipment. b. Training — Provide for training of employees who may be assigned duties other than their normal duties. c. Administrative services — Establish an interoffice mail service between locations. — Provide all necessary administrative services such as the payment of bills, processing of the payroll, handling of employee insurance claims, issuing of critical invoices. — Arrange hotel accommodations for personnel stationed at the backup site. — Provide for additional office facilities, including furniture, phones, and office equipment. 6. Technical Support Team Team Captain: MANAGER OF TECHNICAL SUPPORT—_________________________________________ Alternate:_____________________________________________ Responsibilities: — Install operating system software at the backup site allowing the minimum required operations and communications to be restored. Team Members: — System Programmers Disaster Recovery Functions: a. System software — Supply the required operating system as well as other control systems. — Restore the systems in priority sequence using backup tapes and verifying continuity. — Work with backup site and vendor technical staff as needed. b. Communications network — Determine damage to communications network and provide for replacement equipment. —Work with telephone company to restore full service and order telecommunications facilities as needed. — Notify and inform users of disruptions in service. 7. Database Team Team Captain: Database Administrator—____________________________________________________ Alternate:______________________________________________________ Responsibilities: — Full restoration of the database and verification that the files are current. Team Members: — Data Administration personnel Disaster Recovery Function: Database Restoration and Integrity — Supervise restoration of the database from backup tapes. — Restore intermediate data to allow files to be updated to a current status. — Run tests and check them against user listings. Ensure that future processing is accurate by working with users. 8. Systems and Programming Team Team Captain: MANAGER OF SYSTEMS AND PROGRAMMING—_____________________________________ Alternate:_______________________________________________ Responsibilities: — Ensure production systems are restored and verify continuity of ongoing processing. Team Members: — Systems and Programming staff Disaster Recovery Functions: a. Application systems restoration and recovery — Coordinate with Operations and Data Control to verify proper restart point. — Verify application software programs and libraries. —Supervise file restoration and ensure continuity of data by testing and comparing results to users’ listings. b. Application programs —Supervise the processing of critical applications at backup site or arrange for processing at service bureau. 9. Insurance Department Team Team Captain: MANAGER INSURANCE DEPARTMENT____________________________________ Alternate:______________________________________________ Responsibilities: — Review the damage and notify the insurance company to request appraisal process. — Provide detailed accounting of the damage to senior management. — Process all claims. Team Members: —As assigned Disaster Survival Functions: Insurance and salvage — Contact insurance companies and do follow-up with adjustors. — Review the damage and determine hardware that can be repaired. — Prepare report that details damage and outlines disposition. — Initiate insurance claims. — Advise other teams of the replacement provisions in the existing insurance policy and the allowance for renting and leasing necessary equipment. 10. Internal Audit Department Team Team Captain: MANAGER OF Internal. AUDIT___________________________________________ Alternate: _____________________________________________________ Responsibilities: — Assure that data and file restoration is accurate and on-going processing is proper. Team Members: —As assigned Disaster Recovery Function: Verification of the Integrity of Restoration Operations — Verify restoration process to ensure integrity and continuity. —Audit financial files to ensure recovery process was complete. — Monitor controls and security during recover): mode. C. Team Preplanning and On-going Functional Responsibility 1. Disaster Planning Coordinator a. Keeps Plan documentation updated; correcting names, addresses, and telephone numbers. b. Ensures Plan is properly distributed to team members. c. Develops testing schedule of all phrases of Plan. d. Works with Internal Audit to validate procedures. e. Distributes literature on safety and disaster recovery. f Periodically checks backup facilities. g. Formally updates Plan every six months using input from teams. 2. Emergency Management Team a. Team leader schedules quarterly meetings, discusses current status. 3. Operations Team a. Keeps computer room floor plans current for both main location and backup facilities. b. Keeps organizational charts current. c. Keeps copy of all company hardware configurations. d. Keeps updated copies of vendors who supply hardware, software, communications, supplies, and custom forms. e. Keeps current emergency procedures for fire, water damage, and other potential hazards. f. Keeps off-site backup tapes current. g. Stores current microfiche of critical historical records. h. Keeps Operations run-book information stored off site. i. Keeps contingency plans operational. 4. Data Entry and Control Team a. Keeps a current listing of systems and documentation used by both departments. b. Keeps copy of keying instructions off site. c. Makes contingency agreements with local users and/or vendors for mutual emergency use of each others’ equipment. 5. Special Projects Team a. Keeps current listing of transportation companies to be used in emergency. b. Sets up contingency plans to deal with distribution of emergency funds. 6. Technical Support Team a. Ensures operating system and communications software are off site. b. Keeps current list of disk files and layout identification. c. Ensures communications network information is current, including teleprocessing-line now chairs, terminals, and controllers. d. Prepares a full recovery capability using off-site backup tapes. 7. Database Team a. Ensures database off-site backup arrangements are complete. 8. Systems and Programming Team a. Reviews all systems to ensure adequate arrangements for off-site backup have been provided for programs, files, and documentation. b. Verifies that proper retention is being maintained as noted in retention policy in company records. c. Lists all systems, users, processing priority, and responsible programming staff. d. Identifies critical systems and prepares detailed recovery plans. 9. Insurance Department Team a. Reviews insurance coverage and verifies adequacy of Plan. b. Reviews coverage to verify that insurance is in conjunction with MIS Contingency Plan and that premiums reflect backup efforts. 10. Internal Audit Department Team Reviews MIS Contingency Plan to gain a thorough understanding, checks for control points, and verifies for completeness. IV. SUPPLIERS Not all of the suppliers that you actively work with in your normal business environment will be listed in this section. This section is designed to identify only those specific vendors who need to be contacted to repair or replace equipment or supplies critical to the operation of the data center and required as part of the recovery effort. Information on vendors should include names and addresses of key sales representatives and technical personnel and lists of all local and central emergency phone numbers (Form 30). List vendor policy as to response time and stated position on reacting to emergency situations in delivering replacement equipment. List the major equipment units that are key to this vendor. Details will be part of the installed equipment list. Specify if this equipment can be obtained from vendor as part of current production or if available from used-equipment vendors. If so, list availability. List alternate backup installations or disaster recovery facilities that offer availability of equipment. A. New and Used Hardware Suppliers MIS management often focuses recovery planning efforts on computing equipment. Since computing equipment is more readily interchangeable than people, is more portable than plants, and has shorter lead times than communications, this focus can be very misleading. Nonetheless, it often happens because the DP manager sees his equipment as being the most unique asset. Today, the population of computer hardware, even of any given type or model, is quite large. Job streams are portable from one system to another. Components are portable and can be readily moved from one location to another. Vendors are particularly responsive to emergency situations and usually have a detailed description of the installed hardware. As a result, most data processing installations can expect to be able to find replacement hardware faster than they can find adequate space to put it in. A complete list of all equipment and specifications is essential in a recovery situation. A list of used- equipment vendors and their specialties should be part of this section of the plan. B. Software Suppliers A complete list of any special software, whether operating system or application system, should be kept. Special considerations for backing up these packages should become part of the plan. C. Communications Suppliers Communications facilities normally have long lead times, which can generally be shortened only slightly even in an emergency. These lead times should be kept in mind when the vendor list is determined. When the required lead time of the desired facility is greater than the objective recovery time, a strategy must be prepared to substitute a slower or higher cost, but available facility. For example, dial-up lines may be substituted for leased lines, voice facilities for data facilities, or mail for telephones. D. Special Equipment Suppliers For the most part, computing equipment is modular and portable, and, at least in emergency situations, has short lead times. On the other hand, if there is old, unique, or obsolete equipment, or if there are hardware-dependent applications, then special strategies should be part of the Plan. If an upgrade or reconfiguration is already planned, it may be wise to accelerate the schedule. E. Suppliers of Office-Support Equipment Office-support equipment should be listed, along with alternate devices or plans to obtain services from a second source. F. Computer Custom-Forms Suppliers A complete list of vendors that are providing custom forms should be part of the Plan. Samples and specifications are helpful in describing requirements to a vendor. Forms can often have a long lead time if they must go through the entire cycle from draft artwork to finished product. It is helpful to have more than one vendor that can supply the same form. A single supplier with a heavy work load may not be able to react quickly enough to your needs in an emergency situation. To speed up the entire process, we suggest that your suppliers maintain camera-ready samples of your custom forms, even if they have not previously produced the form for you. This will also provide the off-site protection for sample custom forms that your plan requires. Form 30 Updated Vendor Name_____________________________________________________________ Address of Local Office_________________________________________________________ Phone Number_____________________________________________________________ Contacts: A Sales Representative:______________________________________________________ B. Technical Representatives:_________________________________________________ 1. Hardware______________________________________________________________ 2. Software_______________________________________________________________ C. Emergency Phone No._____________________________________________________ Vendor Response Equipment Alternate Facility V. PRIORITIZE ALL APPLICATIONS A. Rate All Systems with Their Priorities The principal Real of MIS is to provide the ability to quickly restore service for critical applications when there is a serious failure or disruption of regular operations because of fire, natural hazards, power failures, or other causes. The purpose for identification of critical systems is to establish an approach to disaster-recovery planning that will outline functions both before and after a disaster leading to full restoration of services. The planning must be sufficiently detailed so that major decisions will be made prior to, and not immediately after, a serious situation occurs. If a disaster or serious failure does occur, there will be enough confusion and stress without worrying about who does what and how. The continuation of a large percentage of the MIS operations at an alternate site immediately after a disruption is rarely logistically, technically, or economically feasible. It is seldom essential, since the tasks performed are not all of equal importance. This relative importance of the various functions must be analyzed. The following procedures are designed to step through the business functions in priority order. It should be assumed that some reduced level of MIS operations must be endured during the primary post-disaster recovery period. For planning purposes, two key issues need to be determined: for which applications is post-disaster survival critical, and which critical applications need urgent attention when the recovery effort begins. The following information will help in these evaluations. The process will help identify the critical standing of an application within the business. The technique uses seven factors that can help identify an application’s vulnerability to performance degradation when a disaster occurs as well as the con straints that may be present when the application must be restored. The seven factors are: 1. The time that can elapse before the application recovers after a disaster 2. The unique resources required if the application is to be restored 3. The application’s ability to withstand relocation during restoration 4. The MIS Operations staffs experience in testing the restoration process 5. The limits on loss that can be tolerated if application restoration is delayed or not possible 6. Structural or operational defects that may be present in the application 7. Security considerations regarding structure or performance Collectively, these factors establish the relative importance of the post-disaster recovery of specific applications. Individual factor ratings are developed as follows: Ratings: Time 5 Required to reach median operational level 4 to 6 hrs after disaster 4 Required to reach median operational level 7 to 12 hrs after disaster 3 Required to reach median operational level 13 to 24 hrs after disaster 2 Required to reach median operational level 25 to 48 hrs after disaster 0 No time constraints Ratings: Unique Resources 5 Adequate restoration requires full recovery of a unique database and/or custom software/hardware and/or non-dial-up telecommunications facilities 4 Can operate adequately on dial-up basis if all other unique resources are available 3 Can be restored to an adequate operational state with only partial recovery of key resources (i.e., by using the current/backup version of the database) 2 No unique resources required for restoration 0 Can operate during the post-disaster recovery period with some features/subsystems disabled Ratings: Relocatability 5 Application cannot be successfully restored at other than the prime processing/operating site 3 Application can be relocated successfully but with a delay exceeding the application time factor 0 No difficulty anticipated in application relocation Ratings: Experience 5 Currently assigned MIS operations staff has not successfully tested application restoration process 3 Application has been successfully restored in test after a significant delay 0 No delays or other problems encountered during successful recovery testing Ratings: Loss Toleration 5 Failure to restore within time/feature constraints will likely lead to dollar loss exceeding established management tolerance 4 Failure to restore within time/feature constraints will create problem with major customers/suppliers/union groups/regulators that would exceed management/operational tolerance 2 Must be restored successfully, but loss problems associated with incomplete or delayed restoration can be tolerated 0 No loss problems anticipated through delayed or incomplete restoration Ratings: Operational Defects 5 Successful restoration and “normal” operation requires direct involvement of key programmer and operator 4 Application lacks current documentation and/or is scheduled for major overhaul 3 Application has demonstrated sensitivity to changes in data input volume, mix, and/or quality, and/ or has evidenced significant failure rates 0 No known defects Ratings: Security Considerations 5 Application contains corporate-sensitive and/or proprietary and/or employee/customer “personal privacy” data 4 Application controls funds disbursement or otherwise affects custody over assets 0 No special security considerations required beyond exercise of due care during restoration processing Using this assessment technique will result in a ranking of application systems in terms of relative importance. The maximum score for any application, 35, identifies the most critical systems. The ranking provides a guide for proper allocation of limited resources during the post-recovery period. Form 24 can be used to complete the evaluation of all applications. VI. MEDIA PROTECTION A. Protection and Retention of Vital Records The protection and retention of vital records is a part of your normal business operation. Various records need to be made available to society or to stockholders upon request to verify the information your organization supplies as part of their business. Your company also has a legal responsibility to protect certain records for a specified number of years. Magnetic tape is commonly used to store company records because of the amount of data that can be contained on a single reel of tape and because the tape can be easily secured in an off-site location. Storing tape at a commercial media-storage facility is usually very economical and fulfills the requirements of off-site protection for vital records. Some records need to be stored in their original form. The data center does not normally get involved in these procedures because the center only acts as a service center for processing user information. The original forms or source documents are usually the responsibility of the users. Some users choose to copy the original form onto microfilm because of space requirements. The microfilm can be duplicated and a copy sent off-sire to a commercial media-storage facility. For some original records, multiple copies already exist because of the nature of the records themselves. When this occurs, only the controlling of the records at each location is necessary. When there are multiple copies of a particular record but they all reside at the same location, procedures need to be established to capture one copy of the record and store it at another secure location. When the data center has the responsibility to secure the computer records off site, it normally accomplishes this by copying the records to magnetic tape and transporting the tape to the off-site location. Copying the computer records can be a separate step in the normal processing of the information, or it can be part of an existing step. Sending the second or third generation of a tape tile off site may be considered adequate by some managers, but this may not fulfill the needs of your recovery plan. If a second or third generation tape is to be used to bring your files to a current status following a disaster, considerable processing may be necessary following the restoration of the file from the off-site backup tape. Regardless of the method used to store records off-site, a written procedure and a scheduled pickup and delivery, with a person assigned the responsibility for supervising the procedure, needs to be pan of the complete operation. B. Protecting the Database 1. Database backups Complete daily backups are taken when the database is not open. If possible, copy the database to two separate tapes and send one tape off site. The other tape is kept on site for seven days. 2. Updates A log tape is created capturing the before and after image of all modified records. These tapes are kept on site for seven days. 3. Database definition If your database definition normally remains static, the database definition should be backed up only after it is updated. If it changes quite frequently, it can be backed up weekly along with technical support software as long as you also back up the change information. These backups should be kept off site. 4. Software modification source code Software modification source code is stored on a source code file. The source code file is backed up weekly and rotated off site. C. Standard Backup Procedures Most data centers back up their files to tape and rotate the tapes off site. The tapes are created either as part of the normal processing or as a separate step. If the tape is the standard new generation of a file and it is needed for the next processing, a second tape is required so it can be rotated off site. Rotating a second- or third-generation tape instead of the current tape off site would require reprocessing to bring the file to a current status during the recovery mode. The tapes rotated off site should be as current as the processing that created them. The frequency of processing can be any of the following. 1. Daily processing 2. Weekly processing 3. Monthly processing 4. Annual processing 5. Applications cycles 6. Disk volume backups D. Off-Site Storage Off-site storage provides a method of protecting high-speed, machine-readable media that is an essential part of file reconstruction in a disaster recovery mode. Having the media off site reduces the risk of a single disaster destroying all copies of file backups. Disasters such as a fire, explosion, water damage, and aircraft accident will not affect two widely separated locations, and the risk is greatly reduced for area disasters such as those caused by weather conditions. When media is rotated to off-site locations, a copy of the inventory at that location should accompany the media. A second copy should remain at the data center. The listing should contain the librarian or reel number, the name or description of the file, and the time and date the file was created. E. System and Program Documentation System documentation should be created through the computer so the documentation can be easily maintained and easily accessed. It also assures that the information can be backed up and rotated off site. The information can also be printed and stored in a binder. If a disaster occurs at the data center, the system documentation can be recovered along with other critical files. If the documentation isonly on source documents, it must be copied or microfilmed before it can be stored off site. Maintaining these backup copies has traditionally become a forgotten or “do it later” task. Program documentation in many data centers has become the source code listing itself. If a librarian system is used to store and maintain the source code, the librarian file should have file backups scheduled at least weekly. All modifications must also be maintained until the next weekly backup. F. Data-Entry Backup At a data center where the keying formats are modified infrequently, the data-entry format file should be backed up each time the file is modified. A copy of the backup should remain at the data center and a copy rotated off site. If modifications occur more frequently, a weekly scheduled backup may be preferred. All changes must be maintained until the backup file is created. The formats may be designed to execute primarily in your data entry department with the equipment you use. Formats that can be used by a service bureau should be easily available so that other sources may be used in a recovery mode. Other documentation concerning the data entry functions should be entered into some maintainable computer file so that it can also be backed up and rotated offsite. This documentation would include such items as user contacts, schedules, volumes, etc. G. Microfiche Procedures Microfiche is a printed report generated on film. It is an inexpensive and compact method of maintaining reports. Its value in a recovery mode is limited to source information since it is not machine-readable, When errors cause the backup file to be missing or unusable, microfiche may be the only source for reentering data. Microfiche should be stored in a temperature- and humidity-controlled environment to ensure its usability. H. Personal Computer File Backup Those persons having a personal computer have the responsibility to protect files important to the continuing operation of an organization. If a file is critical, it should be well documented and the responsibility for it should be shared by more than one person. Files should be backed up on to diskettes and the diskettes secured off site. An executive who keeps the only copy of a confidential diskette in his or her desk may lose this information if some disaster destroys the office. I. Computer Custom Forms Custom forms are the continuous computer forms designed only for your organization. Replacing these forms by ordering them from a supplier in an emergency situation may not conform to your time schedule. The suppliers measure their lead times in weeks. When the forms are needed for critical applications such as payroll, customer invoices, and production scheduling, you cannot wait for the supplier. Many companies overcome this problem by having multiple suppliers for the same custom form or by storing the same form in different locations. Sample copies of the various forms should be maintained in a secure location to improve supplier turnaround with little loss in quality in an emergency situation. VII. COMPUTER ROOM OPERATING PROCEDURES A. Power-Up Procedures Include computer center procedures or identify source B. IPL Procedures Include computer center procedures or identify source C. Power-Down Procedures Include computer center procedures or identify source O. Schedules Include computer center procedures or identify source E, Operations Run-Books Include computer center procedures or identify source F. Application Responsibility Include computer center list of all production systems and the name of the persons responsible for each. Also include on the list their home phone numbers and any beeper phone numbers. VIII. OPERATING SYSTEM A. Software Operating Environment The optimum in processing at your backup facility is to execute under the same operating system. In order to bring up your operating system at the backup site, the computer hardware must be compatible with your current hardware. If the operating system is coded to execute only on a single mainframe, you may have to arrange with your software vendor to execute on a replacement configuration. If some hardware is different from your own hardware, you will have to do an I/O Generation. If the backup facility operates as a virtual machine, you may bring up your operating system as a sub-task under the host system. B. Listing of All Purchased Software Packages Most software packages are coded to execute on a single mainframe. Make any necessary arrangements with your software vendor so that your purchased software will execute at your backup facility. Include computer center listing or identify source C. Disk Drives and Pile Layouts If the type and/or number of disk drives at the backup facility is different from your data center, make plans for the placement of permanent files and the placement and sizes of work areas. If job control statements will have to be altered, have a method for changing the job control language JCL) already planned out. Include computer center listing or identify source IX. PHYSICAL SECURITY AND ACCESS CONTROL Physical security begins with access to Your organization’s property. Many companies have guards stationed at all access roads to the office complex. Further security is provided by personnel located at the reception desk of each building and at the reception desk of each floor and/or department. Once individuals pass the human element, their access to various areas is normally controlled by some type of access control system. An access control system prevents unauthorized persons from entering secured areas. Other than guards and guard dogs, these systems are mechanical (locks), electronic (typically, an electronic control box used with a key), electromechanical (such as push-button devices that work with electronic badge-readers), digital (devices that allow users to set any combination), and computerized (including systems that can sound alarms, record employee attendance, monitor employees’ locations, and produce operational control reports). Access for the people in the following categories should be stated so that only those persons having the proper authority can enter the various restricted areas: A. Computer OperationsAlthough access to the general office area is normally given to all employees, a few office security measures are noted here. —Confidential information such as a password listing or payroll information should not be left unattended on a desk. — Separation of duties should be maintained in all areas involved in the disbursement of funds, such as payroll and accounts payable. —A rotation of duties for all people involved in the disbursement of funds should be scheduled, and these people should not be allowed to continually work through their vacations. —A manual log to account for all checks should be established. —A manual log for all usage of the manual check signer should be established. — Duplicate check-signing plates should be secured off site, possibly in a safe deposit box. X. SOFTWARE SECURITY A. Sign-On Passwords Most purchased software includes some type of provision for restricting use of the software to only those individuals who know the password. Signing on to the host computer usually requires a password. The password is internally checked against a table that tells the system if the terminal operator has entered a valid password. If the password is valid, the system then displays a menu screen. When the operator selects an entry in the menu, the system checks that entry against the password table to see if the operator has access to the selection. If the user has access, the access could be further restricted as to update capabilities or lust read-only access. Even the information on the screen could be masked so that only the information that the operator has authority to view will be shown. If the operator selects a different purchased software package from the main menu, he or she may have to enter a new password to have access to this new selection. B. Maintaining Application Programs If your application source code is managed by some type of source code librarian, some of the application programs may have password security protecting access to the code. In many organizations that do not manufacture or produce classified types of products, the only programs with password protection are those that disburse funds, such as payroll and accounts payable. Password protection will help deter people from making unauthorized changes to the software. C. Password Maintenance Every organization should have a formal procedure for maintaining passwords. The procedure should begin with issuing passwords to new employees. Documents need to be approved by supervisors before passwords can be issued. The approval will indicate all access authorized for the employee. The entering of the passwords is usually the responsibility of the security officer or someone in the Technical Services Department. Regardless who is responsible for entering the data, a different person should be responsible for verifying the passwords by making a spot check at least once a month. The second person does not have the ability to enter or change passwords, but only to view them and the approving document. Part of the password procedure should be the mandatory entering of a new password by all terminal operators at least once a quarter. Software can be generated to broadcast messages to all operators announcing that, by a particular date, a new password must be entered by the operator or, after that date, the operator will not have access to the system until he or she goes through a formal request procedure. This will prevent one person from using another’s password to access the system. The formal procedure should also provide for clearing a password when someone leaves the organization. Again, this will prevent the past employee or someone else from unauthorized use of a password. XI. BACKUP FACILITIES A fully equipped backup facility where you have tested your operating system and other critical applications is your assurance that, should a disaster ever occur, you are fully prepared to make an efficient and effective recovery with minimal effect on the users you serve. These fully equipped facilities are available to paid subscribers for their use to schedule testing of their operating systems and their applications. They no longer have to depend on a hardware vendor who may supply their cold site with a mainframe and components that they have never tested with. (A cold site is an empty computer room that is equipped with a raised floor, air-conditioning, electric power, and some type of fire protection system, and that is ready for the installation of computer hardware.) Hardware vendors will work with you as much as possible in an emergency situation, but the locating and delivery of hardware could still take as much as 6 to 10 days. After delivery, the vendor must still install and test the hardware before turning it over to you for testing. Having both a fully equipped facility and a cold site as part of your disaster recovery plan is the most cost-effective if you ever have to move off site. The recovery will start at the fully equipped facility while preparations are being made to install replacement equipment at the cold site. When the cold site is prepared, the computer operations can be moved from the high-cost, fully equipped facility to the low-cost cold site. When your permanent facility is reconstructed, the operations can be moved from the cold site to the permanent facility. A. Subscribing to a Backup Facility There are many fully equipped backup facilities from which to choose. For the most part, they offer the same services. They may differ in the amount of floor space they offer, the type and size of hardware that is installed, the physical security they have for their facility, or their geographical location, but they still offer the basic solution: an operational computer ready to be used by one of their subscribers. Before you choose one facility over another, compare their services, location, and hardware, and select the facility that best meets the needs of your organization. The contract you sign will contain many of the following items: terms and effective date, definition of contract terminology, use of the facility, fees and payment schedule, conditions concerning multiple disasters, liability, hardware changes, confidentiality, and termination of contract. B. Facility Layout The company you contract with should furnish you with a layout of the facility. This would include the computer room, tape storage room, office area, conference room, reception area, etc. C. Hardware and Software Some facilities have multiple hardware configurations. You would subscribe to the hardware that meets your needs. The operating system that has been installed on the system should allow you to process your system software and applications with a minimum of changes. D. Communications Communications from the backup facility to your network cannot be effected without planning and testing. If the telephone company has to install new lines in order to connect with your network, it will require weeks of delays. Preplanning with the phone company will expedite the connection of any necessary lines. Hardware is available that can be installed between your host computer and your communications network. This same hardware can be installed at the backup facility and connected with its computer. A link can be easily made between this hardware at both locations. Once the link is made, your communications network is connected with the computer at the backup facility. E. Supplies Available backup supplies have to be moved from their storage areas to the backup recovery facility. Suppliers on the list of critical vendors need to be informed of the location of the backup facility so additional supplies can be sent. F. Testing Your Plan must be tested over and over again so it will be ready to be put into effect once a disaster occurs. Your data center, as well as the backup facility, is continually being upgraded. New software and hardware are being added, and an application that tested successfully in the past may not execute now. The only way to be assured that you can move your operations to the backup facility and know that processing can continue is to test and test and test. Testing at the facility should be well documented. Avoid relying on a single key person to bring up your operating system and restore the necessary libraries and files. If a disaster occurs, that key person may not be available to function as he or she ;lid in the testing phase. 1. Initial testing Initial testing should be concerned with bringing up the operating system, testing job control language JCL), restoring files on disk drives that could be different from the drives that you have in your data center, testing communications, and testing a simple batch job that uses purchased software. Future testing would require testing all applications that were classified as critical to the operations of your organization. It would not be unusual for you to be unsuccessful in your first test attempt. In your first test, you may not even be able to bring up your operating system. The key to success is noting your failures and successes and making changes in your Plan so that your next test is successful. 2. Restoring your tiles and libraries If you were able to restore some of your disk files and libraries in your first test attempt, you are that much ahead in your testing. In future testing, you should restore your files from tapes that would have been rotated off site from your data center. If the disk drives at the backup facility are different from yours, plan ahead for the way you will download your files, how much space will de allocated to work areas, etc. 3. Testing critical applications All critical applications should be eventually tested. This is the only way you will know if they will execute at the backup facility. Prepare a schedule for testing all critical’ applications and note your successes and failures. 4. Testing communications Communications may be required in testing some of your critical applications. If communications is not part of the testing of the first couple of applications on the schedule, it should be tested as soon as possible so that any necessary line requirements can be initiated. 5. Mock disasters Although you schedule testing at the backup facility, your management should approve a surprise test using a mock disaster. With scheduled testing, your people know ahead of time which kind of testing is going to he accomplished and they can prepare for it, even though they may be told not to. Using a mock disaster, you can better evaluate the effectiveness of your Plan. 6. Testing program compilations Support at the backup facility for the programming staff is an essential part of the recovery plan. The programmers must be able to access their source statement library, recompile their programs, and then relink them. They also need access to all of their programming and debugging aids. Compilations should be tested for all languages used in your shop. Test systems should be available unless management has decided to eliminate the test systems because of a lack of sufficient disk space at the backup facility. XII. RECIPROCAL AGREEMENTS Reciprocal agreements are mutual aid agreements between two or more data centers that state that if Center A has a disaster, Center B will allow Center A to share time on Center B’s computer. This approach was popular years ago to comply with management directives to have a disaster recovery plan. At that time, the computer was not an essential part of the organization as it is today. Back then, the applications were primarily financial and were all batch. The computer was not being used 24 hours a day; at many centers, it was idle even during the prime shift. Hardware configurations were similar and so was software. This is not the environment we operate in today. The computer is the driving force that dictates what we build, when we build it, and how we build it. It generates shop orders, production schedules, purchase orders, shipping schedules, and tracts material, and tracks orders from the time they are entered into the system until the finished product is ready for shipment. The computer is on line 24 hours a day seven days a week. When is Center B going to have time to share its resources with Center A, which had a disaster? When each center is operating at 80 percent capacity on its own computer, how can they both operate on a single computer? How can one center supply all of the disk capacity for both centers? With reciprocal agreements, Center A’s disaster will create a disaster for Center B. Reciprocal agreements are still a viable alternative to subscribing to a fully equipped backup facility but only for small data centers that still fit the mold of centers of years ago. A reciprocal agreement normal” consists of the following: —A letter of understanding is exchanged specifying any costs, hours of availability testing arrangements, and a general agreement of response each company expects to offer and or receive during an outage. —A configuration questionnaire is filled out covering current hardware and software and a schedule is set for updating the questionnaire to ensure comparability. XIII. INSURANCE PROTECTION Information is a business asset and, as such, warrants the same protection as other assets. Management has the obligation to stockholders to ensure that all assets are protected. The computer center is as important to an organization as is electricity. A company could not survive if it lost its computer center because of a disaster. Insurance can cover the cost of replacing or repairing equipment or facilities damaged by some disaster. Optional business-interruption insurance can protect you from the extra expenses incurred during the recovery period. Some large organizations have umbrella-type insurance that has extensive coverage for multiple companies at multiple locations. Whatever type of insurance your organization has, it must be consistent with the provisions of the Plan. Having a well-documented and continually tested Plan can help reduce your insurance premiums. A. Data Processing Property Protection Coverage Include any insurance coverage or identify source B, Insurance on Computer Hardware Include any insurance coverage or identify source C. Insurance on Other Data Processing and Office Equipment Include any insurance coverage or identify source D. Business-lnterruption Insurance Include any insurance coverage or identify source XIV. POLICING THE PLAN Once the Plan has been established, there is a continual process of maintaining and policing the Plan to ensure that it is being kept up-to-date and that its recommendations and provisions are being adhered to. The maintenance responsibility belongs to the Disaster Planning Coordinator, but the policing is everyone’s responsibility. The MIS Director has total responsibility for the entire Plan, but his or her full staff should be aware of the contents of the Plan and should notify the supervisor if any action or lack of action as specified in the Plan is noticed. Staff members should not look upon notification as a form of “telling on someone,” but as a more positive method of ensuring that their jobs will be there tomorrow because they are following protective measures today. One of the so called watch dogs of the organization is the Internal Audit Department. The department’s responsibility is not to make policy but to make sure that all departments are following the policy that management has already established. In many cases, it is one of the recommendations of the Internal Audit Department that pushes the data processing management into formalizing the disaster recovery plan. Many times, people in the data processing department had some ideas about what they would do if a disaster occurred, but they never formalized it. Now that they have a plan, it is up to everyone to make sure that the Plan is being closely followed. XV. MAINTAINING THE CONTINGENCY PLAN A. Disaster Planning Coordinator’s Responsibility To help prevent the Plan from becoming out of date, establish a schedule for formal updates to the Plan (see form below). Scheduled updates are in addition to any other intermediate updates that are entered as they occur. Such as address changes, hardware updates, software purchases, etc. Six months has been determined as the time between formal updates. Assigned Date-------- Disaster Planning Coordinator--------------------- Completion Date
B. Team Captain’s Responsibility Each Team Captain also has the responsibility to review and update his section of the Plan at least every six months (see form below). Team Captains----------------------------------Date------------------------------------Initials
DISASTER PLANNING CHECKLIST Now that your organization has completed an impact/risk analysis and has realized that a disaster recovery plan is essential if you are to survive a major disaster, you need to take an inventory of your current environment and any future planned improvements. It is essential for you to have full knowledge of your present situation and your plans for the future so that you can organize and schedule your future tasks. The checklist below will help you note some of you; existing controls, which are already protecting your organization’s assets. Other items on the checklist may prove to be good business practices and easily incorporated into your present operation. The MIS Contingency Plan is a very comprehensive plan. It has been written in generic terms and Functions so that it can be tailored to any organization’s environment. It uses a standard framework that is effective in any phased-development approach to major projects. In order to prepare for the Plan, you must answer a multitude of questions. No single person in your organization will have the answers to all of these questions, but it will require the input from many areas in the data center. This will not only give you a more accurate picture of your status, but also it will generate a feeling of ownership by the entire staff. Any plan, good or bad, will have a better success rate if all parties are eagerly working for the good of the plan. The checklist has been broken down into major categories: General Overview, Data Center Facility, Data Entry, Data Control, Computer Room, Tape Library, Telecommunications, Systems and Programming, Technical Support, Database Administration, Internal Audit, Insurance, Backup Facility, and Reciprocal Agreements, Many of the categories relate directly to sections in the MIS Contingency Plan. After each question is space to answer “YES,” ‘’NO” or “Work in Progress (WIP).” There is also a place to enter a notation as to who is assigned to complete the item or what action is planned. The checklist is designed to be a working paper that can be updated as events occur.
![]() Webmaster--Robert Arnold Last Updated--Monday, January 27, 1997 |









