We Knew What We Needed To Do....
- Published on Monday, 29 October 2007 02:59
Disaster Recovery has become an essential part of our business. The ability to recover our mechanized systems and computing environments is vital to our customers and share holders.
Rehearsing disaster recovery plans in advance really paid off during a recent disaster impacting one of U S WEST’s largest data center facilities in Denver.
On October 29, 1991 near blizzard conditions existed, requiring the Denver Data Center to switch from commercial to generator power.
During the night of the 29th and into the early morning of the 30th, blizzard conditions continued. On October 30th at approximately 2:30 a.m., an eight inch fire suppression system water main ruptured causing extensive flooding outside and inside the Denver Data Center building.
The building operations personnel detected water infiltrating their command center, and they immediately invoked their emergency response procedures which included the Fire Department and key personnel from Computer Operations and the other departments within the building.
The Fire Department arrived within minutes to find thousands of gallons of water accumulating in a patio area that serves as a roof to portions of the building operations command center. This command center houses all environmental equipment for the Denver Data Center as well as provides monitoring capabilities for fifty Central Offices.
Efforts began immediately to control the water by shutting off the water main. By now the water was in excess of three feet and had posed serious threats of flooding and structural damage to the roof.
Once the water main had been shut off, a massive cleanup effort began utilizing existing fire pumper trucks and other mobile pumping devices. Calls were placed throughout the Denver area to enlist the assistance of other water pumps.
Because the ground became completely saturated, the water once again began to fill the area. Due to the extensive water accumulation, and the constraints of the storm drains, the water had now poured into the basement and sub-basement areas of the Data Center. The basement and sub-basement contain multiple Uninterrupted Power Systems (UPS), electrical and mechanical equipment for the entire building. Because the water level was rising faster than it could be pumped out and was nearing the top of the concrete platforms, which serve as a base for the UPS and electrical power equipment, the decision was made to power down all computer systems as well as the entire eight story building.
The water clean-up efforts continued through 3 p.m. At this time the Fire Department detected yet another potential catastrophe. A gas leak was detected outside the data center in a below ground equipment vault. The vault had been filled with three feet of mud and water which caused the gas meters and regulators to become dislodged from the wall. The Fire Department assessed the situation and recommended that the building be evacuated but stated that this decision needed to be made by U S WEST.
A paramount concern for U S WEST, is the safety of the employees, therefore it was immediately decided to evacuate people from the entire building. The gas company was quickly dispatched to turn off the gas and render repair.
In the meantime, water clean-up efforts continued. Because the UPS and power buses had been exposed to water, all components had to be dried and tested.
By midnight, after a safety inspection was conducted by the Fire Department and Contractors, power was then restored to the building. To verify the power grid, a systematic, floor by floor approach was used to power up the computer systems. By 6:30 a.m. the following day on October 31, all CPU’s and supporting peripherals were operational and system connectivity was provided to the clients. By 9:30 a.m. all onlines and applications were restored.
Over the past two years an aggressive campaign for disaster preparedness/recovery and business resumption has prevailed. The commitment to dedicate resources, develop methods, standards, procedures and component plans has been beneficial.
In the last 18 months a comprehensive test strategy and schedule was deployed throughout the five mainframe data centers and seven mini computer centers, supported by Information Technology Services for U S WEST Communications.
Multiple structured walk-throughs as well as parallel tests have been conducted. In August, we developed a scenario which required full participation of approximately 150 people to practice their emergency responsiveness and recovery procedures.
This disaster confirmed our ability to recover in an expedient and controlled manner, which was due largely to the preparation achieved through testing.
Sue McDermott, CDRP, is a Disaster Preparedness Manager in the Information Technologies department at U S WEST Communications.
Kirk Lowery, CDRP, is a Computer Operations Manager in the same organization. Kirk was involved with the recovery efforts for the Denver Data Center.
This article adapted from Vol. 5 #2.