Another Day, Another Crash
Commentary by Paul F. Kirvan
Ever notice how famous celebrities frequently pass away in threes? If we agree on that statement, then
we can say that AT&Ts network has indeed achieved star status. It recently died for the third time in
just under two years. Amazing how such a vast, powerful, diversified network can be brought to its
knees so easily. Perhaps that is what happens when one is so big and powerfulit becomes more
difficult to manage the beast.
AT&T is certainly not alone in network crashes. Bell Atlantic and Pacific Bell had major failures a few months back. Millions of users lost service for several hours. Illinois Bells legendary Hinsdale central office fire blew the notion of unsinkable central offices sky-high. New York Telephone has had its share of central office fires and power outages in the past ten years.
What all these events now statein absolute termsis that todays networks are failure-prone. Maybe not everyone's; some will always manage to stay one step ahead of the grim network reaper. But can we be assured these networks will provide truly uninterrupted service? (By the way, carrier tariffs dont guarantee uninterrupted service, but usually universal service.)
This author casts a strong vote for tin cans and string. Telecommunications professionals who have been in the industry for at least 10-15 years will remember that the network was built on far less sophisticated equipment. Todays digital networks, with their powerful switches and common channel signaling networks, handle massive amounts of voice and data traffic.
AT&Ts most recent outage could have been prevented. It was largely a failure to follow established company procedures. Human flaws. So its not just technology we have to fear. Rather, its just as Pogo so astutely observed. He said, We have met the enemy, and he is us.
But this is not intended to be an AT&T-bashing session, although New York air traffic controllers might wish otherwise. AT&Ts network is the biggest and most complex in the world. It has extensive recovery and self-healing properties. To get an idea just how vast AT&T's network is, visit the carriers National Network Operations Center (NNOC) located in Bedminister, New Jersey. The company really knows how to run big networks.
But how many more network outages will usersof this carrier and all othersbe forced to endure? Advanced technology brings with it both benefits and curses. Unfortunately, we quickly forget all the benefits when a weakness appears.
Here are some very significant concerns. Public switched networks in the U.S. are generally managed by common local channel signaling (CCS) networks. Both local and long distance networks use CCS technology. However, most local and long distance CCS networks do not currently interconnect. But thats going to change over the next few years. Service quality is definitely going to improve by interconnecting CCS networks. However, CCS network interconnection also implies that signaling network failures (e.g. AT&T, Bell Atlantic, Pacific Bell) could spread to long distance networks, creating outages of massive proportions. Are we ready as a nation to deal with that?
Both telephone companies and long distance carriers maintain that their networks have safeguards in place to deal with these and other scenarios. But can we be totally certain they will work? If this is indeed the case, telecom managers will need network contingency plans more than ever. We must ask ourselves: is technology bringing progress, or do we now have loaded guns at our heads?
Carrier service assurance programs are evidence of the growing concern over network survivability. Bell operating companies, major interexchange carriers, and a growing number of independent telcos have customer service assurance programs. These are in addition to their existing network protection programs. These are major ongoing activities.
Continued on page 103
An example is Southwestern Bells MegaLink III service. This is a non-switched, dedicated point-to-point digital service for simultaneous two-way signal transmission at 1.544M bps. The tariff guarantees that if MegaLink III service fails due to telco-provided equipment or facilities, the customer receives a full months free MegaLink service. This is contingent on the telco being unable to restore service within four hours after the outage was reported.
The Federal Communications Commission (FCC) has called for meetings with Bell Atlantic and Pacific Bell to determine what it can do to prevent these events from recurring. It all confirms a growing fear among many in the U.S. telecom community: that Americas public switched networks are increasingly vulnerable.
This latest event will force us all to face the reality of the '90s. We must assume that carriers networks will fail. Not if, but when. Thats a threatening state of affairs. You no longer have a choice but to be prepared. Ask yourself, Can I survive the next time?
Paul F. Kirvan is a principal in Paul F. Kirvan & Associates, an international telecommunications consulting firm.
This article adapted from Vol. 4 #4.
DR World Main Index | Return to DRJ's Homepage
Disaster Recovery Worldİ 1999, and Disaster Recovery Journalİ
1999, are copyrighted by Systems Support, Inc. All rights reserved. Reproduction
in whole or part is prohibited without the express written permission form
Systems Support, Inc.