Disaster Recovery for Businesses
- Published on October 29, 2007
When AT&T and the Commonwealth of Pennsylvania kicked off "Operation Restore Thunder" in late April, our challenge was to demonstrate the company's ability to replace our Harrisburg central office if it was destroyed by a massive explosion and fire.
The week-long exercise began with a simulated drama: The Commonwealth Emergency Management Agency's Emergency Operations Center reported severe thunderstorms inundating Central Pennsylvania, causing flash flooding just north of downtown Harrisburg. The scenario then called for the flash flooding to cause a truck filled with gasoline to crash into AT&T's central office in Harrisburg and explode into flames, interrupting long-distance calling into and out of the state capital.
As part of the exercise, AT&T immediately alerted its National Disaster Recovery team and tractor trailers hauling high-tech telecommunications equipment were dispatched to the Pennsylvania capital. While AT&T crews participating in the drill simulated the restoration of long-distance communications service to south-central Pennsylvania within 72 hours, Commonwealth agencies conducted their own "tabletop" exercises, allowing state information service experts to predict how critical state services might be affected and to plan alternatives for maintaining delivery of public services.
While this simulated disaster did not actually happen, it easily could have. In the wake of disasters such as earthquakes, floods, explosions and hurricanes, the need for reliable communications systems reaches a critical peak.
Operation Restore Thunder allowed AT&T to work closely over an extended period with Pennsylvania's state government agencies to develop communications contingency plans.
"Today's information networks are becoming the backbone of an efficient and responsive state government," said Governor Tom Ridge. "They allow us to deliver public services to Pennsylvania more quickly and effectively while keeping our costs down. Disaster recovery drills like 'Operation Restore Thunder' help us safeguard our investment in these technologies so they'll be up and running when we need them most - during times of emergency."
According to a University of Wisconsin study, more than 43 percent of businesses never reopen, and almost 29 percent close within two years of experiencing a local disaster. The frequency of nationally reported disasters and the dependency modern businesses have on information and communications systems have caused managers to assess their ability to respond successfully to major disasters. Today, every system manager should know the potential impact of a disaster on his or her business, how long it would take to recover essential applications if they fail, and what needs to be done to recover operations fully.
The longer a disaster disrupts communications, the more critical the impact. In the first hour alone, it is estimated that more than 80 percent of the financial institutions would lose nearly $1,000 per hour; an additional 10 percent of the surveyed financial institutions claimed losses of more than $100,000 per hour. A University of Texas study found that 85 percent of businesses are totally or heavily dependent on information systems to stay in business, and that a loss of those systems would cost companies up to 40 percent of their daily revenues. AT&T estimates that it would not take long for the loss of information systems to have a heavy impact - nearly 60 percent of financial companies, 50 percent of service firms, and more than 40 percent of retail organizations would be seriously affected in less than eight hours.
Meanwhile, senior MIS executives believe that their companies will become even more dependent on data communications networks in the future. Substantial numbers of companies are using telecommunications networks to automate critical business operations, deliver information faster, keep their business competitive and improve the quality of service. These applications necessitate bringing more on-line systems into operation, with a greater risk and impact to the organization if the system fails.
Not only do companies face the direct costs associated with a disaster; there are many indirect costs to be considered. For example, if a car rental company loses its toll-free service and a customer cannot get through, the customer will probably go to a competitor. If the customer is satisfied with the service of the other company, it may result in a permanent loss of business.
Another example is an office-products firm that enables customers to directly access its order processing system so they can check inventory, shipping schedules and pricing. Customers unable to communicate with their vendor can lose confidence in the firm. Other intangible costs include:
- Cash flow interruptions
- Loss of customers
- Loss of competitive edge
- Erosion of business image
- Loss of market share
- Legal or regulatory violations
- Loss of investor confidence
For the most disruptive scenario, business-restoration plans should focus first on recovering from the overall disaster and applications failures rather than network failures. Power disruptions can also pose a significant threat to most businesses. We have seen that the impact of a disaster is a function of how well prepared the business is to handle it. Disasters of the same magnitude will affect businesses differently, even if they have the same recovery plans in place. A "tolerable" outage is dependent on industry, geographical dispersion, recovery plans, and other factors.
Consider the following steps as part of a total disaster preparation plan. Set aside time each year to review and update the plan, and become familiar with how disasters are handled by your telecommunications provider(s).
- Plan for all possible contingencies from a temporary or short-term disruption to a total communications failure.
- Consider the everyday functions performed by your facility and the communications, both voice and data, used to support them.
- Prioritize all facility communications. Determine which should be restored first in an emergency.
- Establish procedures for restoring communications systems.
- Talk to your communications vendors about their emergency-response capabilities. Establish procedures for restoring services at your facility.
- Determine needs for backup communications for each business function. Options include telephones (if service is not disrupted by the disaster), messengers, portable microwave, amateur radios, point-to-point private lines, satellite and high-frequency radio.
- Meet with your telecommunications provider(s) to discuss the establishment of an emergency back-up communications system for your facility.
We have addressed the need for disaster recovery by developing a comprehensive disaster recovery program. This program, initiated in 1991, is unique in the telecommunications industry.
The nationwide disaster recovery program has three goals:
1. To route non-involved calls around the affected area (these are calls that begin and end outside the disaster zone, but under normal circumstances would pass through the area.)
2. To provide the affected area with access to the outside world.
3. To restore normal long-distance service as soon as possible.
To accomplish these goals, specialized communications equipment is maintained at strategic locations across the country. Following an official alert of a problem in a central office, specially trained disaster recovery teams and their equipment begin arriving at the emergency scene within 24 hours. Over the next 48 hours, the National Disaster Recovery team will assemble the restoration equipment, splice into the existing access vendor and cables, thereby restoring the functionality of the affected central office. In addition to replacing telecommunications switching offices, the team is also equipped to restore telecommunications service by:
- Erecting a temporary microwave tower.
- Installing a temporary satellite earth station.
- Establishing temporary calling centers to give customers direct access to the network.
Since it was established, the Disaster Recovery Team has helped restore telecommunications service following Hurricane Andrew (1992), the Midwest floods (1993), the Northridge, Calif., earthquake (1994), a major mud slide and tornado in Kentucky (1994), a tornado in Texas (1995), Hurricane Marilyn (1995), and the recent floods in North Dakota. Initial reports from our drill in Pennsylvania indicate that our exercise was a total success, and we will continue to conduct these disaster recovery drills in varying locations at least four times each year to ensure our network services remain reliable when they are most needed - when disaster strikes.
Frank Ianna is executive vice president of Network and Computing Services, an organization of 35,000 employees responsible for AT&T's worldwide network operations and information technology systems.