You probably have heard the story told by the old poem, but let me retell it as accurately as I can recall it. It goes like this:
For the want of a nail, a shoe was lost.
For the want of a shoe, the horse was lost.
For the want of a horse, the rider was lost.
For the want of a rider, the battle was lost.
For the want of a battle, the kingdom was lost.
The story describes how a seemingly inconsequential detail can lead to a disaster.
Consolidations of responsibility to save money, laying off technical and management staff during cutbacks, not taking enough time to train people in their new positions and omitting routine maintenance and testing can and has led to disastrous outcomes.
We don’t know what really may have happened regarding the failure of the horse shoe, but we have seen the result. Who was to blame? The generals? The blacksmith? Did the rider knowingly take a poorly equipped steed into battle? We just don’t know.
Are we doomed to repeat the mistakes of the past? If history doesn’t repeat itself, circumstances with a propensity towards disaster certainly do!
On Tuesday, September 17, 1991 another “shoe was lost for want of a nail,” and a set of circumstances was created with a risk exposure of frightening proportions. It’s another case of how missing “nails” could have contributed to the incredible loss in service.
Actually there were at least three (3) nails improperly installed in this particular “shoe.” The first was the apparent absence of proper maintenance and testing of the backup power system. Then there was a bulb, yes, a bulb, in the visual alarm system which had not been replaced when it burned out. Then there was the audio alarm which reportedly “malfunctioned,” whatever that means. These “nails” contributed to the system failure.
In this case, circular blame will be generously spread, and the accepted truth of what happened will be whatever account is repeated the most times.
What we do know is there is a trend across the nation to cut costs, increase productivity, decrease personnel and in general do-more-with-less in telecommunications. Budgets are being cut, technical and managerial positions are being eliminated, reporting relationships are being changed and responsibilities are being reassigned.
All this is being done while we are increasing our reliance on telecommunications for virtually all our business, government and personal functions.
Our increasing dependence on telecommunications and our increasing reliance on telecommunications based services will result in additional and devastating disasters. Undoubtedly some of these events will be due to missing nails in the shoes. So what can we do to minimize system failures?
Management must effectively integrate telecommunications into the disaster recovery process. Telecommunications must have the same reporting level as facilities management, security and data processing. All telecommunications should report to one responsible person, including telephones, data communications, local area networks, hard-wired data cables, intra- and inter-building cables and communication paths, remote location and long distance networks and all telecommunications supporting computer systems in the report.
Management must insist on the development of a strategic plan for disaster recovery. This plan should contain input from all parts of the organization and should have the objective of mitigating damage in a disaster. The plan should appear on the agenda of the top management meeting at least quarterly.
Finally, management must review the “nails” regularly. Are the visual and audible alarms working properly? Do the rectifiers work? When were the backup power systems tested, and how many hours were they run? Five or six hours or only ten minutes? How many reports of minor failures were reported, by whom and when? Are there any patterns?
What is staffing based on for telecommunications? Is it based on budget cut objectives by an inexperienced manager under pressure or is there some rationale to staffing? How about using a standard such as the number of ports, number of miles of cables, number of locations, distance between sites, number of additions modifications and deletions of terminal equipment, the degree of system management computerization, the relative difficulty of managing different systems, the number of shifts worked at a site and the experience base of the staff?
Let’s not wait for an occasion to place blame. Instead let’s plan effectively, using every glitch in a system or discovered loose nail as an opportunity to learn and plan better. Let’s use the recent event as the impetus for reexamining our policies, procedures, job descriptions, staffing guidelines, organization charts and systems. Let our objective be to insure the “nails” are able to support the shoes, riders and battles.
Benjamin W. Tartaglia, MBA, CSP, is President of BWT Associates, Independent Consultants to Management. The firm specializes in loss prevention, mitigation and disaster recovery relative to telecommunications.
This article adapted from Vol. 4 #4.