During early 1990, the administrative headquarters at Digital’s customer services division, located in southern England, was burned to the ground, resulting in six VAX 6000 systems and approximately three hundred terminals suffering severe smoke damage. Despite high-profile occurrences such as the DEC fire, it is estimated that within the United Kingdom, over three quarters of the medium-to-large computer users have yet to formulate a workable disaster recovery plan, despite the recognition that their systems are essential to the running of their businesses. DEC was fortunate, as a computer manufacturer, to have access to spare machines upon which archived data was reloaded; even so, it was admitted that the loss of the building would cause weeks of disruption.
As serious as the DEC fire was, computer users often appear reluctant to protect the vulnerability of their systems. This often is not due to a lack of awareness, but to the lack of will to take positive action.
However, disaster recovery means different things to different people. The provision of third-party services has increased substantially over recent years. In days gone by, when batch systems predominated, loose arrangements often existed within users of comparable systems; this was increasingly seen as risky, inadequate and unenforceable. Cold facilities were then offered as it was argued, with a certain degree of truth, that the computer facility itself and supporting environmental systems were difficult to reinstate, computer equipment was easily available and timescales were not so stringent allowing acceptable delays. To overcome these shortcomings, companies offered access to a compatible machine in return for an annual subscription fee. Quite often, these companies were processing bureaus, seeking alternative business for declining bureau activity, and incorrectly, seeing disaster recovery as an easy and lucrative way out. Needless to say, companies with this attitude fell by the wayside as the true requirements and commitments of disaster recovery were realized. However, a large number of offerings are available, possibly out of proportion to the number of installations.
In the United Kingdom, there are currently in excess of 35 organizations offering recovery services of some description. An industry survey took place in 1987 to project the growth of disaster recovery services within Europe. Ninety-five percent of the market was found to be in the hands of independent suppliers.
Figure 1 illustrates the expected growth of the market by sub-sector over the five year period to 1992. Detailed analysis of the results indicated that mobile recovery facilities represent the highest growth segment.
Figure 2 shows the expected growth of the market broken down across those countries which are considered to be the major market for these services in Western Europe. Nearly all suppliers of recovery services operate within the confines of their own national markets.
From a purely data processing point of view, a large number of installations have signed up with them to provide emergency processing. Too often, a fixed provision is made within the data processing budget for a recovery service to replace equipment in the event of an adverse occurrence. At Alkemi, too often we have found that the existence of a properly prepared, tested and updated plan is a rarity. On a large number of occasions, the whole problem has been laid at the door of data processing management. Under these circumstances, the recovery process has incorrectly become a purely data processing issue. The recovery strategy consists entirely of resurrecting all systems without identifying those whose non-availability would inflict serious damage to one or more key areas of operation.
Alkemi recently carried out a survey to investigate this apparent lack of formal planning. Representatives from 270 companies known to have had an active involvement with, or interest in, disaster recovery planning, were recently interviewed. The object of the survey was to establish how successful they had been in implementing disaster recovery plans within their organization.
Only 88 companies had a plan in place or in progress, although 58 of the companies spoken to were unable or unwilling to comment on the presence or absence of a plan.
Of the 124 with no plans, 68 stated that this was due to lack of time and resources; such resources as were available were usually diverted to activities considered to have a greater priority. Lack of management support was the reason given by 50 of the 124; six blamed lack of funds.
Furthermore, 102 of the 124 companies with no contingency plan did have an individual designated as “responsible for security and contingency planning” but received no management support. This indicates that senior management had shown little interest or were ignorant of the risks, or that established operational recovery procedures probably involved the immediate computer facility only, without regard to the prime functions of the organization.
At Alkemi, we are in possession of a case study concerning an organization which, although it had subscribed to a third party recovery service, did not have a full plan in place. The main points are as follows:
Due to the lack of both fire protection and detection equipment, a fire in an air-conditioning unit caused severe smoke damage to all installed computer equipment. By the time the day shift arrived, the fire had fortunately burned itself out. It was not until the Marketing Director arrived that positive action was taken. A service engineer was called who attempted to clean the machine, while at the same time relocating it to the conference room. An unsuccessful attempt was made to power up the machine. It was then decided to invoke the disaster recovery contract; replacement equipment arrived and was installed in the conference room which had been converted into a makeshift computer room. Backup data was loaded and reconfiguration took place to emulate the original machine as near as possible. Following re-ordering, it was anticipated that replacement hardware in the form of either repaired or completely new equipment would replace the recovery machine within three to four weeks. The organization was fortunate that the fire was both self-contained and self-extinguishing, allowing the adaptation of the existing premises and normal operational housekeeping procedures to enable recovery to take place. However, had the floor above the computer room containing a research laboratory, full of flammable substances been ignited, the entire building would have been lost and the final outcome substantially different.
Europe has generally lagged behind the United States as far as statutory protection requirements are concerned. Within the United Kingdom, there is currently no statutory obligation for companies to have proper disaster recovery plans in place, although peripheral legislation requires certain, but somewhat vague, action to have been taken.
For example, under the 1987 Banking Act, auditors are required to report to the Bank of England on the adequacy of internal controls, and one of the areas of concern is that of business interruption. The Bank of England guidelines state that “there should be adequate recovery procedures or standby arrangements in place and tested to call on when events occur which cause computer systems to fail.”
Also in the United Kingdom, Building Societies (similar to the U.S. Savings and Loans corporations) are governed by the Building Society Act under which auditors are required to report to the Building Societies Commission in respect of the adequacy of recovery arrangements. Under the Financial Services Act, an application for authorization by the Securities Association includes the question, “are backup facilities available should the deal recording, reporting, settlement and accounting systems fail?”
The Data Protection Act requires that computerized systems are both accurate and properly protected against damage, accidental or otherwise. As an aside, the Data Protection Act does not extend to data held in paper format. None of the existing legislation is absolutely specific, nor does it state as to how protection is to be achieved.
However, the Computer Abuse Act was recently introduced by Emma Nicholson, who, before becoming a Member of Parliament, had a career within the computing industry. The Computer Abuse Act is the first UK computer security legislation specifically designed to combat hackers and the abusers of computer systems. It is now planned to introduce a more comprehensive computer usage bill which would include disaster recovery provisions. Users would be required to comply, by law, with certain minimum standards for maintenance, support and upgrades. The move is supported by the Computer Services Association, which has been pressing for European laws to reflect current United States legislation.
A further spur to this will be the requirement for European companies to communicate more effectively with the introduction of the European Single Market in 1992. Comprehensive disaster recovery plans and facilities will have to extend throughout these organizations and throughout the continent. The way should be paved for collaborative efforts between countries to ensure that mutual systems are effectively protected against disruption.
At Alkemi, in addition to our consulting and training activities, we have also attempted to address the awareness problem. The Survival Pack consists of two items: a book, “The Survivors Guide to Standby Services”; and a video, “The Survival Game.” The book, as its title suggests, provides those responsible for this area with a guide to the current market offerings and also provides an insight into the need for proper contingency planning.
The video is used as an awareness tool to ensure the essential support and proper funding by senior management. The Survival Game graphically illustrates the need for disaster recovery planning as a business requirement without resorting to lurid accounts and horror stories.
While the Survival Pack is not intended to be the definitive guide to disaster recovery planning (this is covered by our major work, Computer Risk Manager), it does address one of the key areas and should help ensure that the overall project receives the correct level of support to ensure its success.
Steve Watt is a consultant with Alkemi Limited in Berkshire.
This article adapted from Vol. 4 No. 2, p. 42.