Prospects for Utilizing Expert Systems to Evaluate Disaster Recovery Plans
The application of the techniques of artificial intelligence to the task of disaster recovery plan auditing is discussed. A prototype expert system for DRP auditing is described and evaluated. We conclude that automated DRP auditing is feasible.
If you have ever been involved in an audit of your organization's disaster recovery plan (DRP), you probably will find the following dialogues familiar, if not pedestrian.
AUDITOR: Are backup copies of the operating system and needed operating software packages stored at an off-site location?
AUDITOR: Is access to the vendor's facility controlled by card control gate or guard booth?
CLIENT: Guard booth.
AUDITOR: Does the vendor provide specially constructed containers for safely transporting data and/or documents from your site and back?
AUDITOR: Is the transportation vehicle environmentally protected against shifts in temperature and humidity?
What is perhaps not so familiar nor pedestrian is that in this case the Auditor is a piece of software, called an expert system, running on an average (486 processor, with 2 Megabyte RAM) personal computer.
Expert systems are the most widely used application of the computer science field of artificial intelligence (AI). In this article we discuss the design and development of prototype expert system which could be used to audit DRPs or to aid in the design of DRPs by simulating an audit. We begin with a discussion of expert systems technology. We will then discuss the application of that technology to auditing DRPs, including a description of a prototype DRP auditing expert system. We conclude with some projections and caveats.
'Artificial intelligence' is an umbrella term for a wide variety of software, all of which are designed to mimic some aspect of human intelligence. Artificial intelligence (AI), software systems include routines that translate from one language into another (German to English, for example), programs for playing chess, systems that recognize handwriting, systems that speak and understand human languages, robot brains, and expert systems.
An expert system is an AI software system that is designed to perform like an expert in a specific area, called a domain. Expert systems are currently employed in a multitude of domains including the NASA space shuttle operating systems, medical diagnoses, and hearing testing. Expert systems have three main components: the knowledge base, the inference engine, and the user interface.
The knowledge base contains the knowledge that the system uses to solve problems and answer questions. The knowledge may be in the form of facts like 'Bill Clinton was elected President in 1992,' if-then rules like 'If the vendor's facility is free standing then it is less susceptible to collateral damage,' and heuristic rules, (or rules of thumb), like 'sprinkler systems are generally more reliable than fire extinguishers.' The expert system discussed in this article has a knowledge base that consists entirely of if-then rules that contain factual and heuristic knowledge.
The inference engine is the software that actually does the reasoning in the expert system. The inference engine is responsible for deriving conclusions and generalizations from the knowledge contained within the knowledge base and the information given to it by the user.
Most of the inference engines in current expert systems are rule-based chaining systems. The knowledge that the expert system is to have and use is codified as a network of if-then rules.
The system will report and/or use the fact that the employee and vendor list condition is met if all three of the conditions preceding the THEN clause are met.
Expert systems may approach decision making from one of two points of view. The system may have a specific goal in mind and then attempt to meet that goal. This strategy is called goal driven or backward chaining. On the other hand, the system may proceed towards a decision that is unspecified but determined by the input data. This second strategy is called data-driven or forward chaining.
If a goal-driven system had only the rule, it would ask the user whether or not the three conditions were satisfied in an attempt to conclude the THEN clause. A data-driven system operating with this single rule would wait for the user to provide the three conditions.
The third component of an expert system, the user interface, is the software that allows the user to query the expert system and receive the system's deductions and recommendations. User interfaces may be closed or open language systems. Closed systems respond to only fixed single character or single word inputs. The system prompts the user to choose the appropriate input response. Open systems allow the user to communicate with the expert system using natural language. Natural language processors are expensive and utilize much time and occupy processing space while closed language systems are inexpensive. The expert system discussed uses a closed language user interface.
Today, anyone can design an expert system. All that is required is an expert in the domain for the system (called a domain expert), a person with expertise in the design of expert systems (called a knowledge engineer), and a commercial expert system shell.
An expert system shell incorporates the algorithms for goal and/or data-driven reasoning, a framework for if-then rule construction, and a variety of diagnostic and report features. There are dozens of expert system shells on the market ranging in price from a few hundred to several thousand dollars. Expert system shells allow the designer to concentrate on the organization of the knowledge domain. The fundamental AI algorithms are already in place. However, not every shell is appropriate for every expert system. For the DRP auditing system, we decided that our best strategy would be goal-driven because we wanted the system to tell us whether or not a DRP was adequate. Therefore, we selected the expert system shell Level 5 (Information Builders, Inc.), an inexpensive shell with a closed language interface that emphasizes backward-chaining.
Almost 90% of all corporate service organizations expect to be using expert systems within the next three years (survey of 3,500 service managers conducted by Service Ware, Inc., Verona, PA in a report titled 'We're Off to Size the Wizard: The Revolution in Service Automation').
The use of artificial intelligence between 1992 and 1995 is expected to increase by a compound annual growth rate of 223 percent ('Banks Wise Up to the Expertise of Artificial Intelligence Systems', Bank Technology News, Sept. 1992, survey Ernst & Young).
Knowledge-based systems (KBSs) - a subset of Artificial Intelligence (AI) which, loosely speaking, includes expert systems and neural networks, provides one means whereby auditors can apply the latest, emerging technologies to some common audit objectives or goals. KBSs, then, provide one the advantage to more fully and thoroughly examine the audit area under review and not settle for simple practical audit sampling.
Expert systems were originally developed to solve both ill-defined and well-defined problems which could not efficiently be solved using traditional algorithms. Auditing, as a discipline, and as a science, has both ill-defined tasks as well as tasks which take place in rapidly changing environments.
Regardless of which term you use, disaster contingency planning (DCP) or disaster recovery planning (DRP), the use of expert systems to assist in performing an audit of your firm's ability to recover its data processing and information technology operations in the event of a disaster (however a 'disaster' may be defined), has direct operational benefits for the audit function (internal or external). The following discussion outlines many of the primaryoperational benefits of using expert systems in the audit function as they specifically apply to DCP/DRP.
Auditing is a business of knowledge, and expert systems are the technology for harnessing and leveraging that knowledge.
Due to the great increases in service/information jobs, users are demanding more access to information, information which is timely and accurate. Thus, auditors are being asked to review 'types' of information which may not have existed three to four years earlier. The audit function itself is under review in an attempt to make it more cost effective and in step with the technological changes embracing the company and end-user community as a whole. Since the traditional audit market is experiencing a natural growth saturation point, auditors must seek to introduce new services, reach beyond the traditional methods, become more flexible in the services they provide, and specialize more.
Sociologist Emile Durkheim, has shown that as societies increase in size, density, and urbanization, the division of labor increases rapidly. To some extent man undoubtedly becomes more proficient as he confines himself to a given field or activity. Today, the sheer volume of knowledge and the complexity of civilization means that the intellectual is almost always limited to a single discipline, and perhaps to only one of its major segments.
Expert systems can assist the audit function achieving its goals the following ways:
- An expert system used for staff training will help auditors increase their flexibility as special audit situations arise and auditors must (or are forced to) change assignments.
- An expert system can process, evaluate, and deliver only the relevant data needed by the auditor.
- Since auditing is primarily labor intensive, expert systems can free auditors from many of the mechanical tasks and provide the labor substitute and productivity boost to audit new information technology (IT) areas.
With an expert system, 'institutional experience' is retained, and the training process could be accelerated, thus reducing training costs.
Computerized tools (i.e. expert systems) can help focus thinking, model disaster scenarios, evaluate network reliability and build disaster recovery plans. Expert systems which model disaster scenarios can become important to the audit process since 'staging' or 'conducting' a real disaster is neither easy nor cost efficient. Disasters can not be planned to fit any one's time table. By creating an expert system which will help analyze elements of a DCP, the auditor can identify weaknesses and exposures in the plan which may have gone unnoticed.
The basic concept is that the expert system will enable the auditor to analyze more options, more scenarios, more combinations of potential situations than if the auditor were to conduct the review from purely a manual perspective. The bottom line is that the auditor, who utilizes an expert system to assist in the evaluation of DCP/DRP, should be able to determine with a greater degree of confidence, the firm's ability to withstand and/or recover from a disaster, using the recovery plan currently in place. By utilizing the rules for expert system design as outlined in the first half of this article and by codifying known audit questions relating to the audit of DCP, the expert system will not only direct the auditor through a logical, orderly analysis of the DCP, but will also provide a final evaluation of the plan's worthiness and acceptability, and its ability to achieve the stated objectives. Additionally, this analysis should provide the auditor with a basis for performing a risk assessment of the firm's overall exposure to risk. Using this information, the auditor can then determine the degree and level of testing which may be required. Through the use of the expert system, the auditor is now able to more accurately report to management the potential for success or failure of the DCP, should conditions warrant its implementation. The success or failure of the DCP could be a harbinger of the success or failure of the organization as a viable entity.
The analysis performed and the data generated by the expert system, in addition to the auditor's own field work, would equip the auditor with the ammunition necessary to convince management of the critical need to continue the development, testing, and maintenance of the DCP.
Of key interest would be the interpretation of the results as provided by the expert system. For example, if the expert system's final analysis showed the DCP to have a 89% chance of success, the auditor might feel reasonably comfortable and confident in the plan's design and executability. The expert system may even potentially reduce the level of audit testing to be performed, making recommendations to management, however, to 'shore-up' weak spots where appropriate. Depending on where the short-falls occurred, management might be willing to accept the risk of not being 100% covered in the event of loss, and determine that the company's exposure of only 89% recovery capability is justified and does not warrant any additional expenditures.
However, if a final analysis indicated only a 23% chance of recoverability, the auditor may have an easier time motivating management to take specific steps to redesign, enhance, and expand upon the current DCP.
It is interesting to note that internal/external audit might not be the only profession which may benefit from the utilization of an expert system based DCP. The authors can identify one profession for which access to analysis provided by the expert system would be invaluable -- insurance underwriting.
A case could easily be made for the extensive increase in a firm's insurance premiums if the underwriter were to determine that the firm stood only a 23% chance of recoverability versus a 89% chance. In fact, if higher than average premiums are not enforced, the possibility of a carrier refusing coverage because of the liability is certainly a possibility.
During the first nine months of 1992 alone, six major disasters (natural and/or man-made) ranging from floods to earthquakes to hurricanes to natural gas explosions have brought the importance of pro-active disaster contingency planning straight to the attention of executive senior management. What is unfortunate, and frightening as well, is that as a result of these disasters, many companies have found out only the hard way that they were unprotected. They did not have a viable, effective, working disaster contingency plan (DCP) in place to reconstruct and/or adequately recover information technology (IT) operations. This ill-fated state-of-affairs has not solely been limited to small or medium sized operations either.
Additionally, as business critical, real-time transaction processing applications become increasingly more prevalent (i.e. electronic data interchange), 'availability' becomes a key information processing issue. The necessity and ability to be in a position to recover key information technology (IT) systems, may be essential to the continued financial (and, thus, overall) success of the business.
Disaster recovery planning has today become almost a corporate buzzword. For many organizations whose mere existence depends upon information technology, gambling that a disaster will not strike or affect them is a decision that cannot be justified on any level. The stakes are simply beyond acceptable limits of prudent management responsibilities.
It is not the objective of this article to discuss the components, design or implementation strategies of a DCP/DRP. To this end, we present a very limited overview of the components of a DCP/DRP. Our focus, instead, is on the design of an expert system...which supports the audit of DCP/DRP. It is our intent to show the breadth and complexity of a DCP/DRP and to recommend that the design of an expert system for DCP be approached in a modular fashion, automating the audit logic in a series of steps rather than designing a complete expert system from the beginning.
In general terms then, the major sections of a generic disaster recovery plan may contain the following:
I. Inventory of current application and systems programs, including also telecommunications programs and network hardware/software.
II. Analysis of individual application systems, with a view towards criticalness of applications to the organization and the impact of the application's loss to the organization.
III. Determination of the organization's application systems' hierarchy.
IV. Selection of disaster recovery backup method dependent upon how long the organization can survive without IT processing, management's backup philosophy, and overall cost of available backup methodologies.
V. Formalization of backup agreements.
VI. Identification, involvement, and commitment of application owners (or designates).
VII. Definition of application requirements, including personnel, hardware, system support programs, telecommunications, data, special forms, etc.
VIII. Documentation of the plan.
IX. Off-site backup and retrieval of critical data, applications software and documentation, system support software and documentation, and special forms, etc.
X. Testing Procedures.
XI. Plan Maintenance.
With this basic format in hand, we will now examine the development of an expert system which addresses one of these components.
In deciding which component to automate first, we examined the work required for each component, the complexity of the concepts embodied within the components, how easily the component could be converted to an expert system format, the amount of information available for each component (which we actually defined as a domain), and finally which components required the use of risk analysis or other ranking methodologies in order to compile the required data.
After a careful review, we decided to design an expert system to assist in the audit of component/section number IX (off-site backup and retrieval of critical data).
One developmental rule of expert system design is to narrow the focus to a finite development area. With this in mind, we decided to refine our development domain even further. This resulted in the final project objective -- design an expert system which will assist in the audit of off-site backup plans, procedures and facilities, as part of a functioning DCP/DRP.
As discussed earlier, expert systems are driven by and are logically based upon rules. The execution of rules along a specifically defined logic path results in the expert system's ability to reach a conclusion.
The very nature of the rule based processing of expert systems makes this technology directly compatible with the rule based (at least rule-implied) philosophies/methodologies of internal and external auditing. Auditors in completing audit programs are seeking answers to rules and questions and an auditee's level of compliance to these rules.
For example, how is access to the operations center within IT controlled? The implied rule here is that access should be restricted to the IT operations area.
By re-wording the audit program question into a rule-based format, the auditor is at the beginning stages of expert system design -- the design of the knowledge base. It is this knowledge base that will be queried by the inference engine of the expert system to determine which logical path to follow and which conclusions will be reached.
AUDIT, our expert system for DRP auditing, was constructed as previously stated using the product Level 5, an expert system shell. The knowledge base consists entirely of if-then rules. The rules encode the heuristic knowledge of a professional auditor (Marcella) concerning the protection of off-site backup and retrieval of critical data, applications software and documentation, and system support software. The 42 rules which comprise this sub-module of AUDIT are so structured that during any given session, the system is designed to guide the user to an appropriate conclusion regarding the adequacy of the DCP with respect to off-site storage facilities and capabilities. Although most of the rules are deterministic, and thus ask for a yes-no or multiple choice response, several rules are probabilistic. These rules ask the user for a percentage estimate of the validity of the user's response. The inference engine then uses Bayesian techniques to produce a probability estimate of the user's plan.
The AUDIT program was written in about 100 person-hours. The development loop had four phases:
1. Interview with auditor on rules of thumb for auditing off-site backup and recovery plans.
2. Discussion with knowledge engineer on the structure and interrelationships of the rules.
3. Encoding of the structured rules of AUDIT using the expert system shell.
4. Test of the system.
The following partial transcript of a session between the AUDIT system and a user is offered to give you a feel for the level of inquiry of AUDIT and to illustrate how the system operates. The transcript begins in the middle of a series of questions about the remote storage facility. AUDIT is posing a question about the facility:
AUDIT:Is the vendor's facility free standing?
AUDIT:Are all adjacent and attached structures adequately protected from unauthorized access?
AUDIT: Are all adjacent and attached structures adequately protected against the outbreak of fire?
In the actual system, the rules bear code numbers rather than text. We have inserted the text here for clarity. Rule D8 Branch is a semantically empty node in the search tree.
In its attempt to achieve the D8 Branch conclusion, AUDIT first asks the question 'Is the vendor's facility free standing?' If the user answers affirmatively, the D8 Branch conclusion is reached with a 90% level of confidence (CF). On the other hand, if the user answers negatively as shown in the transcript, AUDIT follows up with the two questions pertaining to adjacent and attached structures. If both of these questions are answered affirmatively, then D8 Branch is achieved with a 60% level of confidence. A negative response to the 'free standing' question suggests a potential deficiency in the facility. As a result, even a well protected, attached storage facility is viewed as less than secure as compared to a free standing facility. AUDIT will use the confidence levels reached during this exchange in its further assessment of the organization's DRP and in determining what, if any, additional questions (logic paths) it will pursue.
The results of our brief exploration into the possibility of using expert systems for auditing DRPs are encouraging. We believe that automated auditing utilizing expert systems is feasible and cost effective in at least three contexts. First, expert system software along the lines of AUDIT could be quite effective as an aid in developing a DRP. Second, an auditing expert system could serve as an inexpensive pre-auditing tool for an organization. It would provide a means by which a firm could get the 'bugs' out of its DRP before incurring the expense of a professional audit. Finally, an expert system for DRP auditing can provide for almost continuous investigation of DRPs. Once purchased, or developed internally, a well designed expert system could be used monthly or quarterly to evaluate and update the organization's DRP.
Notes and Works Cited
1. The prototype was programmed by Kimberly P. Vahle (a senior Millikin University undergraduate student and James Millikin Scholar) and designed by Ms. Vahle and the authors.
2. A class of expert systems based upon neural networks does not include any if-then rules at all. In a neural network expert system, the system is trained in the domain area, and in effect, establishes its own criteria for decision making.
3. The authors wish to clarify our interchangeable use of the terms Disaster Contingency Planning (DCP) and Disaster Recovery Planning (DRP). DCP typically refers to the establishment of policies, practices, and procedures which are followed in a pro-active attempt to protect and prevent against events (disasters) which could seriously impair normal business operations. In contrast, DRP typically addresses actions which would be taken in efforts to recover from a disaster. We feel that the use of expert systems in auditing is applicable under either contingency or recovery planning.
It is also important to note that whether you use contingency planning or recovery planning, the plan itself should be broader in scope and control than the information technologies function. The plan should encompass the entire organization and should address all critical functions necessary for continued corporate survival.
4. Iskandar, Mai and Paul McMann. 'Expert Systems in Auditing: Advantages and Applications,' The EDP Auditor Journal, Vol. IV, 1989, pp. 41.
5. McKee, Thomas E. 'Expert Systems: The Final Frontier?' The CPA Journal, July 1986, pg 42.
6. Lamond, Bruce J. 'An Auditing Approach to Disaster Recovery,' Internal Auditor, October 1990, pp. 38.
Albert J. Marcella, Jr., CDP, CISA is the president of Business Automation Consultants. James V. Rauff, PhD, co-authored the article.