
Another Day, Another Crash
Commentary by Paul F. Kirvan
Ever notice how famous celebrities frequently pass away in threes? If we agree on that statement, then
we can say that AT&Ts network has indeed achieved star status. It recently died for the third time in
just under two years. Amazing how such a vast, powerful, diversified network can be brought to its
knees so easily. Perhaps that is what happens when one is so big and powerfulit becomes more
difficult to manage the beast.
AT&T is certainly not alone in network crashes. Bell Atlantic and Pacific Bell had major failures a few
months back. Millions of users lost service for several hours. Illinois Bells legendary Hinsdale central
office fire blew the notion of unsinkable central offices sky-high. New York Telephone has had its share
of central office fires and power outages in the past ten years.
What all these events now statein absolute termsis that todays networks are failure-prone. Maybe
not everyone's; some will always manage to stay one step ahead of the grim network reaper. But can we
be assured these networks will provide truly uninterrupted service? (By the way, carrier tariffs dont
guarantee uninterrupted service, but usually universal service.)
This author casts a strong vote for tin cans and string. Telecommunications professionals who have
been in the industry for at least 10-15 years will remember that the network was built on far less
sophisticated equipment. Todays digital networks, with their powerful switches and common channel
signaling networks, handle massive amounts of voice and data traffic.
AT&Ts most recent outage could have been prevented. It was largely a failure to follow established
company procedures. Human flaws. So its not just technology we have to fear. Rather, its just as
Pogo so astutely observed. He said, We have met the enemy, and he is us.
But this is not intended to be an AT&T-bashing session, although New York air traffic controllers might
wish otherwise. AT&Ts network is the biggest and most complex in the world. It has extensive
recovery and self-healing properties. To get an idea just how vast AT&T's network is, visit the carriers
National Network Operations Center (NNOC) located in Bedminister, New Jersey. The company really
knows how to run big networks.
But how many more network outages will usersof this carrier and all othersbe forced to endure?
Advanced technology brings with it both benefits and curses. Unfortunately, we quickly forget all the
benefits when a weakness appears.
Here are some very significant concerns. Public switched networks in the U.S. are generally managed by
common local channel signaling (CCS) networks. Both local and long distance networks use CCS
technology. However, most local and long distance CCS networks do not currently interconnect. But
thats going to change over the next few years. Service quality is definitely going to improve by
interconnecting CCS networks. However, CCS network interconnection also implies that signaling
network failures (e.g. AT&T, Bell Atlantic, Pacific Bell) could spread to long distance networks,
creating outages of massive proportions. Are we ready as a nation to deal with that?
Both telephone companies and long distance carriers maintain that their networks have safeguards in
place to deal with these and other scenarios. But can we be totally certain they will work? If this is
indeed the case, telecom managers will need network contingency plans more than ever. We must ask
ourselves: is technology bringing progress, or do we now have loaded guns at our heads?
Carrier service assurance programs are evidence of the growing concern over network survivability.
Bell operating companies, major interexchange carriers, and a growing number of independent telcos
have customer service assurance programs. These are in addition to their existing network protection
programs. These are major ongoing activities.
Continued on page 103
An example is Southwestern Bells MegaLink III service. This is a non-switched, dedicated
point-to-point digital service for simultaneous two-way signal transmission at 1.544M bps. The tariff
guarantees that if MegaLink III service fails due to telco-provided equipment or facilities, the customer
receives a full months free MegaLink service. This is contingent on the telco being unable to restore
service within four hours after the outage was reported.
The Federal Communications Commission (FCC) has called for meetings with Bell Atlantic and Pacific
Bell to determine what it can do to prevent these events from recurring. It all confirms a growing fear
among many in the U.S. telecom community: that Americas public switched networks are increasingly
vulnerable.
This latest event will force us all to face the reality of the '90s. We must assume that carriers networks
will fail. Not if, but when. Thats a threatening state of affairs. You no longer have a choice but to
be prepared. Ask yourself, Can I survive the next time?
Paul F. Kirvan is a principal in Paul F. Kirvan & Associates, an international telecommunications consulting firm.
This article adapted from Vol. 4 #4.
DR World Main Index | Return to DRJ's Homepage
Disaster Recovery Worldİ 1999, and Disaster Recovery Journalİ
1999, are copyrighted by Systems Support, Inc. All rights reserved. Reproduction
in whole or part is prohibited without the express written permission form
Systems Support, Inc.