Spring World 2015

Conference & Exhibit

Attend The #1 BC/DR Event!

Fall Journal

Volume 27, Issue 4

Full Contents Now Available!

October 29, 2007

DISASTER 101: A Hands-On Recovery Lesson

Written by  Ernie Moore
Rate this item
(0 votes)

Thanks to a commitment to safety and preparedness, and a lot of luck, American Republic Insurance came through the recent Midwest flood ordeal much wiser and better prepared to face a future catastrophe.

Disaster survival is the hard way to test your contingency and recovery programs. It is also the only true test of the effectiveness of your plan. If you’re like me, a disaster plan is in the same category as the health insurance our company sells. You cannot risk being without it... but you hope to never use it. I don’t wish the “practical experience” of a disaster on anyone. But I do hope that what we learned firsthand will help others devise more pragmatic and functional recovery policies.

Luckily, for American Republic, the summer floods turned out to be a positive learning experience. Flooding was not a major threat to our employees’ safety or to our office building. What the repercussions from the flood jeopardized were our computer operations and thus our ability to do business.

Safety

American Republic was safety conscious long before contingency or disaster plans became popular. Our corporate office building, a fire resistant concrete superstructure, is a good example. Built in the mid 1960s, our eight story, company owned facility was constructed of and furnished with the highest fire retardant rating materials available.

A bonus was that our building design and “clutter free” internal office policy put us in compliance with many of the temporary emergency restrictions imposed by the city on structures that were potential fire hazards. Although all businesses were running on reduced staffs, we were confident that our people were safe from the danger of fire as well as the flood at our facility.

Trust

Our employees trust us to insure their safety, disaster or not. But this trust goes both ways. Company management trusted employees to get the company through the disaster and our people did not disappoint us. This is the crux of our contingency and recovery program. Trust the people to know their jobs and be the best resource to restore order and functions to their individual areas.

Project planning, and involving employees in that planning, was an important factor in our successful and rapid resumption of business operations. We were in the middle of a major revision of our recovery plan when the flood broke loose. But by relying on the know-how and abilities of our employees, we were able to handle it. We had already begun to implement parts of the plan in stages rather than waiting until the entire program was finalized.

Early Preparedness

I like to define this phase as being able to put in to place those recovery procedures which take relatively little time and meet little resistance. Before I was given the mission to develop a comprehensive, workable, and economical recovery plan, I was as naive as anyone about the risks, potential damage and impact of a disaster.

My first task was to educate myself as to the risks. What does recovery entail? What had to be done immediately? What are typical after effects of a disaster? I must admit that once I became “disaster literate,” I had an entirely different mindset. When I understood all the possible consequences, I literally had nightmares about a disaster happening before I could finish the rest of the new plan. It could be a tragedy if I didn’t get a viable plan in place, and quickly.

Well, my worst fears came true. Disaster! But as our Chairman Watson Powell points out, it was a positive “learning experience” for us. American Republic has always had a commitment to safety procedures and formal documentation of operations and functions. This, combined with the first of the contingency and recovery policies in place, saved the day... and my conscience.

The early preparedness procedures, which we put into effect during the disaster, focused primarily on the critical issues of safety, communications and organization.
1. Call Out Lists - List for every employee, by department (contact information and assignments)
2. Organization & Command - Names of employees (and alternates) designated to command positions during a disaster. Explicit definitions of roles and the chains of command.
3. Disaster Management Team - Appointment of the Disaster Recovery Coordinator (DRC), Disaster Assessment Team members and alternates for all positions.
4. Command Center - Specifications for when, where and how to establish a Command Center (onsite or offsite). Include allocation of resources and personnel to support the DRC and other critical staff, facility, communications and supply issues.
5. Safety & Security - Outside confirmation of safety and security measures. Top-to-bottom facility inspection by Des Moines Fire Department. Police Department inspection and evaluation of security procedures and arrangements.
6. Business Impact Analysis (BIA) - Core function management assessment of estimated costs of Data Center shutdown for one week intervals, for up to four weeks. Appraisal of ability to restore service to current and minimum levels and emergency equipment replacement services.

The above phases of early preparedness and BIA were completed before the disaster. Fortunately, this put us in an excellent position to respond to the flood and utility disruptions and minimize the impact on operations. Although our complete Disaster Recovery Plan was not in place, the phases most critical to this particular crisis were in effect and implemented.

Plan Evaluation and Update

One note about what could have been a weak link in the recovery. Our Data Center assessment recommended that we establish a hot site; and in fact, the week prior to the disaster, a formal presentation for a hotsite was made, which we didn’t have a chance to subscribe to before the disaster. After our recovery, we did contract for a hot site as backup in case our data center is damaged. It turned out, in this instance, that we did not need the hot site.

Our assessment had pointed out that we did not have firm assurance from vendors that we could get our own data center up and running quickly. When we started, all we could get were vendor assurances that they would do their best to get us replacement equipment.

About that time, I was contacted about a service that guaranteed to replace and deliver our configuration to us within five working days of notice. We signed up and that critical piece of “insurance” was a lifesaver for this insurance company. We had a safe, secure building. And although our equipment wasn’t damaged, we could not run our water cooled unit. The vendor came through, with the configuration, even though this situation was not covered in our contract specification. You can bet this “insurance” service has become an essential element of our plan.

With the initial phases of our plan active, we successfully recovered from the Des Moines flood without disrupting our business and service to our policyholders and our agent network. We are proud that we maintained our high level of service without interruption.

Once the danger and the waters had receded, I immediately got to work on modifying and refocusing the overall contingency and recovery plan based on our experience. We found that our tri-level team involvement concept worked well and will retain this structure:

  • Level I - Executive and senior management - Coordinate initial assessment, response alternatives and decisions for action.
  • Level II - Middle management, team leaders and alternates, and the full recovery team - Assess effects on operations, physical and services impact, and immediate actions to restore operations.
  • Level III - All employees - Continuous communication and assignments to assist Levels I and II.

We met our top priority objective of recovering the core function departments, those which provide primary services to our policyholders and agents. The core function departments have responsibility for implementing sections of the plan that are applicable to their areas. Their authority oversees the general response procedures outlined in the plan. [See Core Function Department Recovery Checklist.] For support, the ancillary departments provide manpower and resources to assist core function department recovery efforts.

Learning From Our Experience

As a result of our experience with the flood, we decided to keep procedures at a general level. Early on in the planning stage, I had envisioned writing detailed requisites and recovery instructions for assigned teams and alternates to simply read and follow. After our “live” test, I know and am confident of our employees' capabilities under pressure.

The potential number of variables in a disaster is unlimited; it is impossible to detail the appropriate responses and options for every circumstance or possibility. Therefore, I have built flexibility into our plan. It is more of a broad framework outlining the priority recovery objectives, requirements checklists, feasible alternatives and timetables. A major disaster does not present a pat, predictable scenario adaptable to some standard solution. Every company, every disaster, and every recovery will be unique.

Therefore, I have concentrated on developing an infrastructure to support the recovery teams, to provide them with requisite resources and communications needed to restore operations to minimum, if not normal levels, in the shortest time frame. By delegating to and relying on departmental level staffs, I am free to concentrate on the big picture and coordinate with government and emergency groups. The people in each department assume the general responsibility for recovery of their respective operations.

This disaster taught me to “think flexibility.” Our plan serves more as a guide, a framework from which to determine first priorities (especially as to safety) and then move on to securing the facilities and equipment.

Recovery and keeping open for business hinge on the knowledge and efforts of our people. We found that what a disaster recovery plan really means is an overall plan to bring together the tools people need to do their jobs. Although we have a centralized general plan, our success during the flood was a result of decentralized responsibility and action. We trusted each group to know best what they had to do and to do it.

We learned to be cautious each step of the recovery. Don’t create a second disaster because of over zealousness or overlooking the small, but important stuff. Consider fire safety and the health and welfare of the people at every step during a recovery. Another important lesson we learned was to have dependable suppliers to call on, vendors that guarantee in writing that they’ll be there when you need them - even at 3 a.m. on a Sunday morning.

It is essential, and after our experience considered mandatory, to have guaranteed contracted services. “Best effort” promises from a vendor are just not good enough when your business is on the line. I, for one, would not care to disappoint and explain to 21,000 agents and 450 employees depending on us for their livelihood -- people that sell and believe in insurance -- that we did not take the precaution to “insure :;ae th guarantee of the equipment that runs our business.

You have to know your building services people and include them in the planning process. Ours actively participated in the decision-making and creative problem solving processes during our successful recovery.

Perhaps the biggest lesson we learned was not to try to anticipate every option, every problem, because the variables are endless. Because our plan was not complete, we didn’t have the chance to do that. Now we know we don’t want to because we proved that by being informed, we could react quickly and take immediate action.

The moral of this lesson is that all the planning in the world would have been useless had we not been prepared to act... and to trust in our employees.

SIDEBAR

American Republic Insurance Company of Des Moines, Iowa was born to survive. For openers, the company was but a few months old when the stock market crashed in 1929. Since that first catastrophe almost 65 years ago, American Republic has survived other human-made and natural disasters--most recently The Great Flood of 1993.

Ironically, American Republic was in the midst of revamping and expanding its contingency and disaster recovery plans when the Des Moines and Raccoon Rivers overflowed on Sunday, July 11, a little after midnight. The American Republic headquarters, located a few blocks inland from the juncture of the rivers, escaped water damage. However, access streets were flooded, the central city power was knocked out and the municipal water supply was contaminated.

Because the flood broke on a Sunday morning, American Republic was not confronted with employee evacuation and safety, its first priority. The building itself sits well above water level and was not in imminent danger. The impending shut-off of the water supply, in a city literally drowning in water, posed the biggest problem for American Republic’s operations. At risk was the Data Center, which supports 450 employees and 21,000 national agents and houses the corporate data processing system.

The initial power outage and subsequent interruptions were not an immediate threat. Data Center UPS (uninterrupted power supply) equipment kicked on immediately to provide power to the Data Center equipment. These backup units protected the IBM 3090-200S mainframe computer, three IBM AS/400 midrange computers, imaging systems, and peripherals and network resources from potential damage to the hardware circuitry and loss of data stored on the system.

The IBM mainframe central processing unit (CPU) was the machine in greatest peril. This type of CPU model uses a water cooled system (similar to a car radiator) and the Des Moines Water Works provides the water source. It was imperative to shutdown the machine, before contaminated water reached the cooling system or the city shut-off the water mains, to avoid thermal damage to the equipment from heat build up. Luckily, telephone service was not affected yet and, over the phone, the weekend security officer on duty was walked through the mainframe shutdown procedure.

Once the power and water situations were under control, The American Republic recovery team personnel began to alert vendors and suppliers of the potential disaster situation. American Republic notified their vendor that they might require an air cooled IBM 9121 as a temporary substitute for their water cooled CPU until the water contamination problem was resolved. Within the hour, options for CPU replacements were reviewed to verify that there replacement could be delivered to Des Moines within two to four days.

American Republic management and recovery personnel met that afternoon to review the assessment team reports and determine further action in response to the disaster. One of the first orders of business was deciding to install an air cooled IBM 9121-320 mainframe. By the time American Republic confirmed the order for the replacement mainframe, a 9121 air cooled CPU, was in transit from Chicago to Des Moines by truck that evening.

The delivery truck, despite torrential rains and road closures, rolled into Des Moines the next day, Monday afternoon, barely 24 hours after American Republic ordered the equipment. IBM was standing by to install the unit and run diagnostics when the computer arrived. American Republic reopened their Data Center Monday night, running a diesel generator as backup protection against power outages.

American Republic Insurance was back in business without interrupting essential services.


Ernie Moore is Associate Vice President-Business Planning at American Republic Insurance Company in Des Moines, IA.
This article adapted from V7#1.

Read 1767 times Last modified on October 11, 2012