Comdisco Disaster Recovery Services (CDRS) has always been proud of its industry leadership position. In that role, we continually strive to raise corporate awareness of the potential for catastrophic events to occur.
We have learned that these situations, usually unforeseen, occur in the most unlikely of circumstances often crippling an organization’s ability to function effectively. We ourselves are not immune to these unpredictable events, as we found out on October 31, 1991.
CDRS has led the industry in supporting customers who have had disasters which required invocation of recovery plans and operations.
We have learned a great deal from supporting the recoveries of more than 66 actual disasters and more than 13,000 subscriber tests. One of the benefits we provide our customers is the sharing of experiences gained in supporting these recoveries.
In the mid-1980s, CDRS began sponsoring seminars and subscriber briefings to share the experiences learned in supporting the most publicized disaster to date: the Montreal fire at Steinberg Corporation’s headquarters. When the industry experienced its first multiple disaster, as a result of the Chicago floods in 1987, CDRS continued sharing information through a similar set of seminars and briefings.
In 1989 and 1990, the San Francisco earthquake and New York power outage produced unprecedented concurrent declarations, eight and 12 respectively, for CDRS.
Once again, CDRS provided speaker platforms at our user conference and supported industry conferences on the subject of experience. We have, and will continue to learn, along with our customers, during these situations.
Each event, or set of events, continues to mature the industry. The Chicago floods demonstrated that a commercial vendor could support multiple, concurrent declarations.
The Hinsdale fire at the AT&T switching station provided insight into the importance the communications industry plays and the impact of an external disaster. San Francisco and New York dealt with user area recoveries in conjunction with or separate from data processing outages.
CDRS recoveries in Paris, London and Singapore highlighted the global nature of our industry. A 203 day customer occupancy of our Cypress, California facility demonstrated the viability of outfitting and using a cold site for an extended period.
Throughout these events, CDRS has continued to focus our subscribers on the fact that disasters do in fact occur and that everyone is vulnerable.
During the last several months, the industry has learned that “everyone” includes vendors. This situation also confirms that a vendor strategy of multiple centers and transportability for networking is a critical selection criteria.
October 31, 1991, (Halloween) was an especially frightening night for CDRS’ Carlstadt, New Jersey complex.
The Carlstadt complex is made up of two separate buildings, interconnected for channel connectivity. Building A, located at 430 Gotham Parkway, houses our Continuous Availability Services (CAS) products. Building B, located at 480 Gotham Parkway, contains our traditional computer Recovery Centers (hot sites) for IBM and Tandem users. Let’s take a look at the events that occurred.
At 5:00 a.m. Thursday, October 31, 1991, extremely high tides, as a result of a tropical storm off the Eastern Seaboard, caused flooding in the Carlstadt area, including the parking lot adjacent to CDRS’ Carlstadt, New Jersey complex. These tides were the highest level recorded in the past 30 years.
The flooding was a direct result of a failure in a water control system. Essentially, an earthen dike used to contain and divert tidal waters was damaged prior to the storm and allowed flooding to occur. Subsequent repairs and comparable tide levels confirm that repair action has eliminated the problem.
As a result of the flooding, dual commercial power fed into the CDRS Carlstadt Complex were interrupted. At no time did water enter or threaten to enter either CDRS building.
However, on the outside of the 430 Gotham Parkway building, externally mounted electrical switch gear was damaged by water.
Within five hours, commercial power was restored to CDRS’ 480 Gotham building, which houses our dual IBM 3090-600 backup offering, obviating the need to utilize diesel generators as originally planned. The 430 building was restored with diesel power by 11:00 p.m. Thursday and converted back to commercial power on Sunday, November 3, 1991.
Immediately upon identification of the problem, CDRS implemented its own recovery plan. Such plans exist for a potential interruption to any CDRS facility as an acknowledgment that no facility, customer or any vendor, is exempt from a problem.
As part of the plan, CDRS’ CDRS NET architecture initiated instantaneous, automatic rerouting of the backbone network around the affected facility, ensuring integrity of the any-to-any facility linkage.
We also rerouted the CDRS disaster declaration hot line number and other Carlstadt phone lines to our North Bergen, New Jersey facility for uninterrupted phone coverage.
Also, CDRS’ immediate steps included the declaration of a disaster for our CCSC business unit, which provides the CAS services, into one of the four recovery centers in the 480 building. This action, as with the 66 previous customer disaster declarations, necessitated rescheduling five customer tests. This was in full accordance with long-standing CDRS policy.
Separately, our remaining recovery facilities were continuing to support testing and stood ready for customer disaster needs. CCSC officially removed itself from disaster status on Sunday, November 10, 1991, returning the utilized hot site to full testing and recovery availability.
Following an in-depth review, CDRS has committed to undertake significant steps to ensure against a potential reoccurrence. We believe the steps we will undertake to be at a level of redundancy and protection commensurate with the criticality of our services.
First, a documented plan to construct a retaining wall with sump pumps completely isolating the property has been initiated. This will be a fail safe backup to the earthen dike.
Second, we will institute our own program to ensure that proper monitoring and maintenance procedures for the tidal control system are documented and followed by the appropriate owners. Discussions are underway and 100 percent cooperation has been assured. This will allow us to ensure we are in control of the situation at all times.
Third, a secured, waterproof enclosure to the power system is being constructed as an additional fail safe measure. This enclosure, including dual sump pumps on battery backed up power, will ensure protection to the switch gear. In addition, a water detection system will be added as an early warning and protection capability.
We also believe we, and the industry, learned a valuable lesson: No one is immune to an unplanned outage.
Finally, a solution for adding diesel power generation will be implemented. Engineering studies and preliminary local zoning approvals have taken place, paving the way for an expeditious completion.
These steps, we believe, will ensure that a reoccurrence is a virtual impossibility. We have already begun construction of these fail safe steps. Due to varying construction schedules, completion will take place over the next few months.
It is important, we believe, to step back and consider the experiences we have gained from supporting ourselves and our customers during these disaster recovery activities.
First, we have learned that clear, concise communications to an organization’s customers and the press is of paramount importance. As is often the case, several trade journals quickly picked up on our disaster recovery story. In spite of our attempts to accurately portray the situation to the journalists, a great deal of contradictions and inaccuracies were reported. This even resulted in one of our subscribers being grossly misquoted in terms of their opinions and plans following the event. This subscriber has been extremely helpful in assisting us with other customers who were concerned about the out-of-context comments printed in the trade press.
Second, we were gratified to see the value of our CDRS NET Any-to-Any architecture. We believe this strategy enhances our ability to sustain an outage in the system with minimal subscriber impact. It should be noted that the Carlstadt recovery center utilized to support the disaster only represents 15 percent of our North American supply of capability, leaving more than adequate protection for our remaining subscribers. Our philosophy has been, and always will be that numerous, geographically dispersed facilities ensure one outage could never cripple our business and/or unduly expose our customer base.
Third, we have reconfirmed the value of having a documented, tested plan for recovery. The ability to relocate and restart our CAS products was directly related to their preparatory steps.
Fourth, all organizations should investigate the option of having cellular phones available to facilitate communications. With no power, and a commensurate loss of our PBX, cellular phones provided a valuable communications lifeline.
Lastly, CDRS, in conjunction with its consulting team, has developed and implemented a set of procedures to evaluate all of the physical components of each recovery facility.
These procedures, included in our Prevent! product, are thought to be the most exhaustive proforma in the industry, incorporating over a decade of personal experience and in-depth collaboration with clients and vendors concerning areas of uniformity and compliance.
The objective is to provide an extra level of protection for each facility and produce a risk analysis and a set of physical specifications down to CAD based floor plans and schematics for all hardware and power components. This will enable the development of a preventative program that minimizes the likelihood of a disruption and reduces the impact of a disaster.
In summary, CDRS believes the interruption to our Carlstadt Complex was an unfortunate circumstance. We believe we responded with swift and effective actions and minimized CDRS customer exposure. We also believe we, and the industry, learned a valuable lesson: No one is immune to an unplanned outage.
John A. Jackson is executive vice president of Comdisco Disaster Recovery Services, Inc.
This article adapted from Vol. 5 #1.