Another Day, Another Crash
- Published on October 26, 2007
Ever notice how famous celebrities frequently pass away in threes? If we agree on that statement, then we can say that AT&T’s network has indeed achieved star status. It recently died for the third time in just under two years. Amazing how such a vast, powerful, diversified network can be brought to its knees so easily. Perhaps that is what happens when one is so big and powerful—it becomes more difficult to manage the beast.
AT&T is certainly not alone in network crashes. Bell Atlantic and Pacific Bell had major failures a few months back. Millions of users lost service for several hours. Illinois Bell’s legendary Hinsdale central office fire blew the notion of unsinkable central offices sky-high. New York Telephone has had its share of central office fires and power outages in the past ten years.
What all these events now state—in absolute terms—is that today’s networks are failure-prone. Maybe not everyone's; some will always manage to stay one step ahead of the grim network reaper. But can we be assured these networks will provide truly uninterrupted service? (By the way, carrier tariffs don’t guarantee “uninterrupted” service, but usually “universal” service.)
This author casts a strong vote for tin cans and string. Telecommunications professionals who have been in the industry for at least 10-15 years will remember that “the network” was built on far less sophisticated equipment. Today’s digital networks, with their powerful switches and common channel signaling networks, handle massive amounts of voice and data traffic.
AT&T’s most recent outage could have been prevented. It was largely a failure to follow established company procedures. Human flaws. So it’s not just technology we have to fear. Rather, it’s just as Pogo so astutely observed. He said, “We have met the enemy, and he is us.”
But this is not intended to be an AT&T-bashing session, although New York air traffic controllers might wish otherwise. AT&T’s network is the biggest and most complex in the world. It has extensive recovery and self-healing properties. To get an idea just how vast AT&T's network is, visit the carrier’s National Network Operations Center (NNOC) located in Bedminister, New Jersey. The company really knows how to run big networks.
But how many more network outages will users—of this carrier and all others—be forced to endure? Advanced technology brings with it both benefits and curses. Unfortunately, we quickly forget all the benefits when a weakness appears.
Here are some very significant concerns. Public switched networks in the U.S. are generally managed by common local channel signaling (CCS) networks. Both local and long distance networks use CCS technology. However, most local and long distance CCS networks do not currently interconnect. But that’s going to change over the next few years. Service quality is definitely going to improve by interconnecting CCS networks. However, CCS network interconnection also implies that signaling network failures (e.g. AT&T, Bell Atlantic, Pacific Bell) could spread to long distance networks, creating outages of massive proportions. Are we ready as a nation to deal with that?
Both telephone companies and long distance carriers maintain that their networks have safeguards in place to deal with these and other scenarios. But can we be totally certain they will work? If this is indeed the case, telecom managers will need network contingency plans more than ever. We must ask ourselves: is technology bringing progress, or do we now have loaded guns at our heads?
Carrier “service assurance” programs are evidence of the growing concern over network survivability. Bell operating companies, major interexchange carriers, and a growing number of independent telcos have customer service assurance programs. These are in addition to their existing network protection programs. These are major ongoing activities.
An example is Southwestern Bell’s MegaLink III service. This is a non-switched, dedicated point-to-point digital service for simultaneous two-way signal transmission at 1.544M bps. The tariff guarantees that if MegaLink III service fails due to telco-provided equipment or facilities, the customer receives a full month’s free MegaLink service. This is contingent on the telco being unable to restore service within four hours after the outage was reported.
The Federal Communications Commission (FCC) has called for meetings with Bell Atlantic and Pacific Bell to determine what it can do to prevent these events from recurring. It all confirms a growing fear among many in the U.S. telecom community: that America’s public switched networks are increasingly vulnerable.
This latest event will force us all to face the reality of the '90s. We must assume that carriers’ networks will fail. Not “if,” but “when.” That’s a threatening state of affairs. You no longer have a choice but to be prepared. Ask yourself, “Can I survive the next time?”
Paul F. Kirvan is a principal in Paul F. Kirvan & Associates, an international telecommunications consulting firm.
This article adapted from Vol. 4 #4.