Worst Single-Site Disaster Tamed by Crisis Management Approach
- Published on October 25, 2007
Just shortly after noon on Friday, February 26, an explosion rocked New York City and was literally heard around the world. The bombing of the World Trade Center took the lives of six people working in the huge complex and reportedly caused more than 1,500 injuries. It also took the world’s business and financial communities by surprise and, unfortunately, many companies with data centers or remote printing operations in the complex were caught without a disaster recovery plan in place.
As hundreds of firefighters and rescue workers searched the many floors of the affected buildings and worked to free commuters trapped in the path tunnels below, a rescue effort of a different nature was already underway for customers of SunGard Recovery Services.
Almost immediately, SunGard’s Crisis Management Team swung into action. Before the first customer called, the team had already identified every potentially affected subscriber and their exact locations in the World Trade Center complex. Using SunGard’s proprietary Resource Management System (RMS), the team identified the required configurations and developed an action plan for recovery efforts.
The Crisis Management Approach enabled SunGard to respond swiftly and efficiently as the disaster unfolded, analyzing recovery requirements and anticipating subscribers’ needs in the event of a worst-case scenario. The Crisis Management team is comprised of SunGard’s senior management and recovery center operations staff. Once immediate needs were identified and addressed, the operations staff touched base with key suppliers to ensure that any additional equipment that might be required would be available for immediate shipment to either a MegaCenter or remote recovery location. Operations and security teams were mobilized, vacations canceled and the scheduling of 12-hour shifts began.
Subscribers scheduled for weekend testing were put on notice of possible declarations while the Crisis Management Team quickly initiated strategies to minimize disruptions of the testing schedule. All emergency scheduling and logistical plans were in place by 8:00 p.m. Friday, 15 hours before the first disaster declaration by a SunGard subscriber.
By 11:00 a.m. Saturday the first declaration was called in to SunGard by a DEC subscriber, Yasuda Bank & Trust (USA). Four other subscribers followed including a commodities trader, a commodities processing service, a major reinsurance provider and a transportation company. The commodities trader, a subscriber to IBM 4381 service, was the last to declare and did not do so until Tuesday, March 2.
In fact, the configurations involved were spread across multiple platforms - IBM, DEC, and Tandem - due to SunGard’s commitment never to assign more than one data center in any one building to the same hotsite. Throughout, SunGard maintained ample recovery capacity for additional disruptions.
On Saturday at 1:00 p.m., the affected commodities processing service (a Tandem Cyclone Subscriber) declared; within 30 minutes, the Operations Support Team at the recovery site had their system operating — hours before the subscriber arrived on site. By 2:00 p.m., the team had a large part of their network configuration ready.
The operations team attributes this smooth process to having worked closely with the subscriber during their frequent recovery plan testing, a key component in many successful recoveries.
The biggest service level subscriber from the bombing, a transportation company using an IBM 3090-400J configuration, declared Saturday at 3:30 p.m. Half an hour later, the reinsurance provider (IBM 3090-400E) also declared.
As subscribers arrived on site, senior operations managers worked with each one of them to ensure coordination of all necessary logistics - special equipment needs, hotel accommodations, meals and other details.
The Crisis Management Team remained activated and early Sunday they completed an updated configuration plan for subscriber testing and support for the on-going recovery efforts.
By 11:00 a.m. Monday, the commodities processing service had the system at their home site up and running while SunGard kept their Tandem Cyclone configuration at the recovery site running parallel in the event of the home system’s failure.
An important point: within 24 hours of arriving on site, each subscriber had their systems restored and operational and were preparing to implement the production processing phase. The first phase of the crisis had passed.
At this writing, all five subscribers have made rapid, successful recoveries. Three subscribers have migrated to their homesites or other coldsite locations until they can return to their corporate data centers; two are continuing to operate from the recovery facility.
Mr. Onorato joined SunGard in January 1989 as senior director, operations, bringing him 15 years of experience in data processing and telecommunications. In his current position as Vice President, Operations, Onorato oversees the day-to-day operations of SunGard’s Philadelphia MegaCenter.
This article adapted from Vol. 6 #2.