Thursday, 15 November 2007 20:36

Hidden Hazard: Single Point of Failure Can Be Catastrophic

Written by  Steve Birge

Exposures leading to AT&T network outage common to many organizations Despite sophisticated protection systems and a tremendous amount of redundancy, one of the world's largest networks crashed, suddenly and completely. AT&T's InterSpan frame relay network failed late on Monday, April 13. A software problem in two frame relay switches apparently propagated itself and spread to the other 145 nodes in the AT&T network. Analysts estimate AT&T's network serves 6,600 corporate customers, including many in the financial services, electronic commerce and credit card authorization businesses. Following a massive effort by AT&T, service was restored in about 24 hours. Lessons from the outage are relevant not only to AT&T and its customers, but also to the DR industry as a whole. The key mistakes leading up to the crash are common to organizations of all sizes, and avoidable.