Full Scale BR Testing
By Paul Bergee
Business recovery planners agree: plan testing is essential to the planning process. Unfortunately, many companies limit business recovery plan (BRP) testing to their mainframe, LAN, and telecommunications systems.
At CUNA Mutual Group in Madison, Wisconsin, we recently executed a full-scale business recovery test of all the components essential for a full and successful recovery. Local emergency government, the American Red Cross, utility providers, fire and police departments, emergency medical units, hospitals, and other businesses at our test site joined CUNA Mutual Group in the test.
To our knowledge this was the first integrated disaster recovery exercise in our community that involved governmental agencies, local disaster response teams, and business participants in a full-scale disaster test scenario.
As the manager of Corporate Business Recovery, which reports to Corporate Risk Management at CUNA Mutual Group, I have been involved in the company’s business recovery planning for over 20 years.
Since the development of our first mainframe disaster recovery plan in 1976, we have expanded the scope of our planning to include all the company’s U.S. sites and are now focusing on international operations.
Our BRP includes a traditional mainframe hot site as well as network, LAN, voice, and data communications systems recovery. In 1988 we began writing detailed plans for each of our 320 business units, an effort that we completed in 1995.
The following 11 internal disaster response teams, comprised of appropriate company staff, have been developed to focus on specific aspects of the recovery process:
•Building and Grounds Security
•Customer and Media Communications
•Employee and Family Assistance
These teams drive the disaster response. The Employee and Family Assistance Team is also responsible for arranging after-hours child care.
How can we know that our plan will be adequate if disaster strikes? On April 2, 1996 we staged a full-scale test of our business recovery plan.
The Test Scenario
The ideal disaster test scenario uses a true-to-life model that draws participants into the exercise and allows them to test their procedures realistically.
In south central Wisconsin, the annual threat of tornados is something everyone understands. We developed our BRP test as a response to the following fictional scenario:
• On the morning of April 2, a tornado strikes without sufficient warning.
• The tornado destroys major portions of the high rise office building occupied by some departments.
• Twenty people are injured.
• Electric power and telephone service are lost.
We developed a full-scale simulated response, involving teams from local fire, police, and Emergency Medical Service (EMS) units, our local power company, a local hospital’s trauma center, and the American Red Cross. Nine hundred CUNA Mutual staff took part in the test along with employees of other companies that occupy the high rise building, bringing the total number of participants to 1,500.
Planning for the Test
A test of this size requires months of planning. External planning should begin early in the process. In January 1996, we contacted Dane County Emergency Management, a governmental agency that proved an excellent source of background information for this type of exercise and provided the external support structure we needed.
They assisted us in the early stages of test development by helping us establish the scope of the test and working with us to develop a list of agencies and public service providers that we invited to participate. Internal planning depends upon full executive approval. At CUNA Mutual Group, testing is built directly into our BRP. This ensures management support from the beginning. In fact, our management saw the exercise as a public relations opportunity, extending the test to include our own communications systems in the test and inviting the local news media to participate.
A disaster recovery exercise with this level of community involvement is very visible and communication was critical to our planning activity. For example, we learned that the media might routinely monitor cellular phone calls. Our plan called for the use of cellular phones, and if the media had not been alerted in advance, it might have been assumed and reported that a real disaster was in progress. Similarly, many people have scanners that pick up 911 calls in the area, so we chose not to include the 911 system in the test to avoid misinterpretation by anyone overhearing the call.
We also contacted neighbors at the test site, informing them about the test well in advance so they would not be alarmed when they saw the fire trucks, squad cars, and ambulances arrive and several hundred people evacuating the building.
Our Technical Response Team focuses on four primary recovery functions: mainframe, LAN, network, and telephone. The information they were given before the test was limited to the test location, date, and time and the number of participants. Other than preloading a back-up server from off-site storage (loaded in advance to save time and reduce costs during the test), they were expected to rely on their prepared BRPs.
We briefed the building owners and representatives of the other businesses that share the site about what kind of activity to expect the day of the test. One of the larger businesses took this opportunity to test its own evacuation procedure.
Our ability to respond to disaster relies on the actions of our 11 internal disaster response teams. These teams (Damage Assessment, Building and Grounds Security, Customer and Media Communications, etc.) are made up of appropriate individuals from within the company.
Because the teams are tested regularly, their preparation for the test was limited to updating the team member lists and making sure each team had current disaster procedures available. We informed team members of the test date, but avoided over-scripting to allow us to assess accurately our ability to respond in an emergency.
The purpose of the test was to determine the adequacy of our business unit recovery plans and to answer these questions: Are the priorities set properly? Are equipment and personnel lists up-to-date? Will the departmental and technical recovery plans work in a disaster? The business unit plans include descriptions of critical equipment needs, such as microcomputer specifications, telephone extension requirements, LAN and network needs, etc. As mentioned above, our only advance technical preparation was to pre-load data files from the test site on a backup LAN system.
Our exercise gave local EMS and hospital staff a chance to test their disaster response procedures. Twenty volunteer victims from CUNA Mutual Group were briefed in advance about the roles they would play and what they could expect when emergency crews arrived. The local hospital provided detailed descriptions of the injuries that the volunteer victims portrayed. EMS and hospital staff used these injury descriptions during triage to prioritize treatment.
In all our BRP tests, we ask our Internal Auditing Department to serve as independent observers. They are stationed at strategic locations and asked to record their unbiased observations. We ask them to record errors and omissions and to express their opinions about staff response, attitude, and apparent general knowledge. For this exercise we also asked members of the Business Recovery Planners Association of Wisconsin, a regional user group, to observe and provide feedback.
We have learned through actual disaster events that Command Center operations are essential to a quick recovery. A key component of our preparation is to have equipment and procedures ready for our three Command Centers:
• The Mobile Command Center is equipped with cellular phones, response procedures, call lists, floor layouts of all buildings, keys for all the buildings, temporary security passes, flashlights, tape recorders, cameras, damage assessment forms, pens, paper, and a number of other tools. When a disaster is reported, the Mobile Command Center is set up at the disaster site and operates as the focal point for first response communications. Local police and fire departments support the Mobile Command Center concept.
• After the fire trucks, police, and other emergency response personnel leave the scene, equipment from the Mobile Command Center is transferred to the Site Command Center. The Site Command Center is established at a location near the disaster site where we can complete the disaster assessment. Staff at the Site Command Center provide information updates to the Primary Command Center.
• The Primary Command Center provides a central point for all communications, response team deployment, and executive decision making activity. This Center has much of the same equipment as the Mobile Command Center. From the Primary Command Center we can make decisions on a broad scale and inform the response teams, management, and staff of the recovery situation. This Center remains in place until business units are operational.
In all of our business response plans we have predetermined alternate sites where business units will move in a disaster. The alternate site for this exercise was the home office, located a few miles from the test site.
Our recovery plans prioritize which departments are to be moved, what telephones must be switched to the alternate site, what equipment is needed, how many employees will need to be relocated, and a number of other key recovery elements.
We scheduled the test for April 2, 1996 at 9:00 a.m. With camera crews on site and hundreds of people involved as participants or observers, we realized the value of the months of pre-planning that went into the exercise and hoped that we had considered every contingency. The timetable below outlines the significant-and sometimes unanticipated - events of our disaster exercise:
8:00 - The weather forecast predicted clear but windy weather. We contacted all the key players, including the police and fire departments, and made sure that everyone was ready to begin. The EMS units were fully staffed and hospitals were prepared. All key internal personnel were on site. Everyone received the signal that the test was a “go”.
8:30 - We learned that the EMS units had been called out on an actual emergency and could not guarantee when they would be available to participate in the test. Some of the volunteer victims were late, but we had several “backup” volunteers willing to take their places.
8:35 - The Mobile Command Center contacted the Primary Command Center by cellular phone and determined that the home office was ready for the test.
8:36 - We assembled all the management participants and issued last minute instructions.
8:45 - We synchronized our watches. All disaster drill observers were in place.
8:52 - Due to an unexpected announcement, the test began eight minutes ahead of schedule. Employees started the tornado drill and everyone, including visitors, began taking shelter in the stairwells or restrooms at the test site.
8:53 - All internal and external participants responded to the early start.
9:01 - The call was made to the city police command center, giving them information about the disaster, advising them that several people had been injured, and requesting help to get people out of the severely damaged building.
9:02 - The “All Clear” was given.
9:02-9:05 - Over 1,000 people reported, as instructed, to predesignated locations in the parking lot for an employee count.
9:05 - The fire department, EMS units, American Red Cross, and police arrived at the scene with lights flashing. The EMS units had responded to their real emergency, a traffic accident, by this time and were available to complete the exercise.
9:06 Several local television stations had reporters and cameras on site and began conducting interviews.
9:15 - The employee count was completed and all personnel were present.
9:20 - Employees returned to work and the EMS units continued evaluating the volunteer victims.
9:25 - From the Site Command Center, located at a safe distance from the damaged building, I informed the department managers that the fire department reported damage to several floors. The building owner allowed us a few minutes to assess the damage. Information from the fire department’s report was used to highlight damaged areas on the building floor layout that had been retrieved from the Mobile Command Center. Department managers were dispatched with the floor layout and Damage Assessment Forms to their floors for a preliminary damage assessment.
9:45 - Each department manager reported back to the Site Command Center with the damage assessment information. This information was transmitted by cellular phone to the Primary Command Center, which reported the disaster to the disaster response teams.
10:10 - The Mobile Command Center and Site Command Centers were closed. Selected employees and managers from the damaged building moved to the home office, the alternate site for this building. Reports from the disaster site were posted on the walls of the Primary Command Center so they were available to edit and/or view. All the disaster response teams met at the Primary Command Center.
10:15 - Test observers were stationed at all locations, taking notes on the exercise. Phones were switched by the Technical Recovery Team from the damaged site to alternate locations selected by the Relocation Team. Interviews with the media were conducted in cooperation with them. Customer and Media Communications Team. LAN and AS-400 Technical Teams were switching operations to the alternate site. Mainframe connections were being established. Voice messages regarding the disaster were recorded on our external 800 voice communications Disaster Hot Line. Network systems were converted to the alternate site. Staffing to replace injured employees was begun by the Employee and Family Assistance Team. The Transportation Team was arranging for transporting personnel to the alternate site. The Salvage Team contacted the salvage company simulating a request for immediate response. Individual departments’ business recovery plans were tested for completeness and readiness in the event of an actual disaster.
11:30 - The Primary Command Center was closed. All systems returned to normal.
6:00 - Local television news programs broadcasted the event.
What We Learned
On April 4 all the major participants met to review the exercise, identify what worked well, and discuss what went wrong. Of primary importance was the response of the participants.
Everyone, from the volunteer victims to the onlookers took the test seriously, assumed their roles, and participated as though it had been an actual emergency.
The test provided valuable information to help us respond to an actual disaster. Listed below are some of the lessons we learned and how they helped us improve our ability to respond to an actual disaster.
•The announcement that started the test early came from the home office. We were not aware that announcements at the home office were broadcast simultaneously at all sites. We have changed the procedures for announcements so they may be directed to specific sites.
•The banners and flashing lights on the Mobile Command Center were not sufficient to attract the attention of some of the participants at the disaster site. We will need to use the bull horn frequently during a disaster to provide instructions.
•It was difficult to identify disaster response team members. Although we had colored vests and hats available for the response team members to wear, distribution was overlooked during the test. This is now a high priority element in our disaster plan.
•Some participants, including people taking shelter in the stairwells, did not hear the “All Clear” announcement. We have ordered portable, two-way radios so we can communicate from floor to floor in a disaster.
• Cellular phone batteries did not last long under such heavy use. We have added extra batteries to the disaster response kits.
• One of the primary cellular phones was not activated during the exercise. Anyone who does not use a cellular phone frequently may not know how to operate one. This lesson points up the necessity of detailed training for anyone expected to use specialized equipment during a disaster.
• Some of the technical teams had only prepared one operating system for the LANs. However, the test revealed that our users require more than one operating system. Getting the appropriate operating systems installed delayed recovery. This is the kind of detail that will slow the recovery process. By running a test of this kind, we identified the fact that we need to be able install multiple operating systems and have updated the Technical Team ‘s recovery plan.
One of the most significant lessons we learned from our April 2 exercise was about the test itself. We attempted to perform hundreds of tasks in a very short period of time, and we performed far too many key activities simultaneously.
In the future we must spread a test of this magnitude over several hours to simulate more closely the sequential pattern of a real disaster response.
Is testing valuable? Here are some insights we gained that we would not have without performing the test:
• Preparation for a test forced us to improve our plans. Knowing that we would be in the spotlight for even a short period of time made us reread the plans, rethink each individual step, and update the material as necessary.
•We learned that we need to inform the Primary Command Center progress via cellular phones during the entire exercise. The Primary Command Center is the focal point for communications and is in contact with staff at all sites.
• Posting status reports at the Primary Command Center proved to be a very important activity that allows for quick response when in disaster mode. As team members arrive at the Primary Command Center, they can quickly determine the disaster extent, know who has been called and, most importantly, begin their own recovery responsibilities—all without having to get an individualized verbal debriefing of the event.
• In my role as Corporate Business Recovery manager, I spent over 200 hours preparing for this test, and over 1,500 people were involved. People learn what to do in a disaster far more quickly under the pressure of a test than they do reading about it in a manual, and they have an opportunity to test their ability to make decisions under stressful conditions.
• By including external resources in the exercise, we had an opportunity to develop many new professional relationships within the community. The fire fighters, EMS personnel, police, and other public disaster experts have a wealth of knowledge and experience.
During the exercise, we talked with the fire chief, police officers, EMS units, and emergency government officials. Their comments about the actions they took, who was in charge of a disaster site, and how jobs were assigned to fire fighters and rescue personnel were invaluable for fine-tuning our business recovery plans. Local government disaster agencies are willing to help and in our case were eager to participate in the test. Their vast experience in emergency situations contributed to our ability to prepare for disaster.
Paul J. Bergee, CDRP is Manager of Corporate wide business recovery planning at CUNA Mutual Group, an international major insurance corporation, located in Madison, WI.
DR World Main Index | Return to DRJ's Homepage
Disaster Recovery World© 1999, and Disaster Recovery Journal©
1999, are copyrighted by Systems Support, Inc. All rights reserved. Reproduction in whole or
part is prohibited without the express written permission form Systems Support, Inc.