Jumping into Action: Taking Steps Toward Recovery
Although it may sound odd, some crisis situations are easier to plan for than others. In those cases where an impending major storm, for instance, is the culprit, its path can be tracked and its landfall can be predicted. This advance notice gives an organization’s crisis management team a little extra time to prepare and evaluate processing and networking requirements, time frames, customer personnel notification procedures, and other details for those companies in the storm’s path.
In the case of an earthquake or flood, however, there is more urgency and no time to prepare, so advance planning is crucial. The crisis management teams, at both the hotsite and customer locations, must be well rehearsed and prepared to jump into action at a moment’s notice.
The first step in managing the effects of the Chicago flood (and this holds true for virtually all crises that involve data centers) was to pinpoint and contact those customers who were likely to be affected. In this case, SunGard’s Chicago team was able to quickly pull up a geographic listing of all potentially affected subscribers using a proprietary software program.
With subscriber names, numbers, and configuration information at their fingertips, the customer support staff was on the phone within minutes, making calls to the deluged and potentially deluged to assess their respective situations.
While the affected subscribers were being identified and contacted, key members of SunGard’s crisis management team, which included operations, customer support, network engineering, and account management personnel at the Chicago, Philadelphia, and San Diego centers, conversed via a conference call. This group remained in continuous contact throughout the duration of the flood.
During this initial conversation, contingency plans were reviewed and assignments were made. Key among the early tasks was to evaluate damages at subscriber locations; assess the current status of customer configurations; ascertain their respective requirements; and determine the recovery time-frames required.
Taking Action Based on Planning
Also among the early directives was the need to determine requirements for additional equipment. As part of this equipment acquisition effort, pre-identified key vendors, such as IBM, Computerm, and Codex were notified and put on standby. Pre-identified vendors of other key supplies and services, such as electricians, office furniture dealers, caterers, and helicopter rental agencies—were also informed. Hotels near the recovery facilities were contacted and blocks of rooms were reserved.
These advance vendor warnings have always proved valuable, and this recovery effort was no exception. For instance, one subscriber was experiencing a major printing bottleneck that was interfering with its operations. The problem was easily solved by bringing aboard Computerm equipment to drive the laser printers—all within hours. Another subscriber needed an additional IBM 3745 front-end processor, which was acquired and installed overnight.
Situations such as these were easy to handle because of contingency planning. Meanwhile, at the customer locations, subscribers were also taking the steps necessary for recovery—checking their business resumption plans, gathering documentation, having backup tapes delivered to the hotsite, reviewing staffing requirements, and locating critical resources such as forms, supplies and facilities.
One financial services customer took the added precaution of chartering airplanes and helicopters to ensure it would receive some two tons of papers that would eventually be processed at the Chicago MegaCenter.
A financial services organization was forced to evacuate its building. It was good planning rather than luck that it had the foresight to chose an alternate operating site in advance.
The company quickly and efficiently relocated its people. Via a special 800 number the company had designated for disseminating information, each department was able to obtain relocation data and instructions. Employees simply dialed in and received the needed information.
A number of hotsite customers had work-group recovery built into their planning process. Two subscribers required end-user recovery facilities. Within hours of being notified by these banking customers, the Chicago MegaCenter had acquired 140 workstations complete with controllers, terminals, and handsets to accommodate the subscribers. Tables, chairs, and office equipment were also obtained.
In one instance, a number of modems were also provided to meet a subscriber’s needs.
Within eight hours of receipt of the equipment and supplies, two separate work-group areas within the recovery facility were functioning as an office away from home for the displaced employees.
During a regional disaster like the Chicago flood, many safeguards need to be taken. For instance, because of power stability problems, several of the 10 customers who put us on alert status remained that way for most of the first week and some continued on alert status through the full two weeks.
While many companies were unprepared for such a disaster, quite a few organizations, with tested recovery plans in place and crisis management teams ready to act, managed to keep their heads above water during the industry’s biggest crisis to date.
Although these subscribers had no need to declare disasters, it was reassuring for them to have their situations monitored and the recovery team ready to go—just in case.
Maintaining Communications is Key
Establishing and maintaining communications during the course of the recovery effort is crucial, and the Chicago flood was no exception.
At the various recovery centers, operations personnel had the task of confirming various telephone numbers for key subscriber contacts, including office, alternate location, beeper and home numbers. In this case, because many subscribers were evacuated from their buildings and switchboards were shut down, having access to a variety of alternate numbers was critical. SunGard also supplied cellular telephones to subscribers, for use in the event of a central office outage.
The crisis management team, as part of its communications effort, also needed to ensure end-to-end connectivity between the MegaCenters and affected subscribers’ networks.
For the network systems engineers, this meant double-checking communications links, evaluating backup network solutions, reviewing network configuration changes, and the like. The team concerned itself not only with maintaining the appropriate communications links, but also with finding ways to reach key end-user locations. These tasks were performed for all customers who declared disasters, as well as for those who were on alert status, to ensure ongoing network connectivity. Meanwhile, the crisis management team remained in telephone contact with subscribers who had declared disasters and those who were on alert to assess their situations on an hour-by-hour basis.
This continuous contact is always crucial, especially during the early hours of a large regional disasters when rumors and half-truths abound, and when the situation is likely to be unstable. Regular phone calls—hourly during the first days of the disaster—were made and detailed logs of conversations were kept to ensure that no directive or request slipped through the cracks.
The Final Stretch
As the recovery efforts progressed and the flooding was controlled, the situation stabilized across the board. Many subscribers made the transition back to their home data centers, moving from disaster to alert status.
Because power instability was a continuing concern throughout the crisis, however, many subscribers remained on alert for almost two weeks . Finally, on Monday, April 27, at midnight, the last customer left the Chicago MegaCenter for its home base. The industry’s largest, most comprehensive business resumption effort was winding down.
Successfully enduring a regional disaster, whether a flood, hurricane, or earthquake, is by no means easy. Loss of power and communications, personnel deployment issues, scarce resources, inaccessibility to the affected area—all are additional problems that exacerbate the disaster situation.
While you cannot predict the outcome of these situations, you can mitigate their effect with the right mixture of preparation, coordination, and cooperation. When you can’t depend on luck—you have to depend on planning.
Robert Winkler is the Director of Operations at SunGard Recovery Services’ Chicago MegaCenter. Gene La Valle is Manager of Customer Support with SunGard Recovery Services Inc.
This article adapted from Vol. 5 #3.