Innovative Enterprise Risk Management Tools
- Published on Thursday, January 06, 2011
- Written by C. ARIEL PINTO, Ph.D. & THERESA A. KIRCHNER, Ph.D., MBCP
|Risk||Potential for exposure to loss which can be determined by using either qualitative or quantitative measures|
|Risk Assessment||Process of identifying the risks to an organization, assessing the critical functions necessary for an organization to continue business operations, defining the controls in place to reduce organization exposure and evaluating the cost for such controls. Risk analysis often involves an evaluation of the probabilities of a particular event.||Risk Management||Culture, processes and structures that are put in place to effectively manage potential negative events. As it is not possible or desirable to eliminate all risk, the objective is to reduce risks to an acceptable level.|
Table 1: Risk-Related Definitions – DRJ / DRI International Glossary
Risk assessment has been a key component of business continuity/disaster recovery planning since its widespread emergence in the 1980s. However, risk identification, evaluation, and management are used in much broader contexts as well and cross multiple disciplines, ranging from business-centric to military to community-at-large spheres. Emerging risk management-related tools offer increasingly sophisticated and affordable options for organizations such as the risk-based return on investment (ROI) approach; failure mode, effects, and criticality analysis (FMECA) methodology; and modeling and simulation (M&S) options. Our goal is to give you a high-level overview of what is new and important in the risk analysis arena that has applicability and value in the business continuity planning/resiliency environment.
The Business Continuity Glossary of Terms, jointly maintained by the Disaster Recovery Journal (DRJ) and Disaster Recovery Institute International (DRI International/DRII), offers definitions of several risk-related terms, as outlined in Table 1. A related, alternative definition of “risk,” as defined by the Society for Risk Analysis is “the potential for realization of unwanted, adverse consequences to human life, health, property, or the environment.”
At a simplistic level, risk can be estimated by a function of consequences and probabilities. Consequences can be assessed in terms of magnitude, e.g. loss of life/vital assets (catastrophic effect), loss of mission, loss of capability with compromise of some mission, loss of some capability with no effect on mission, and minor or no effect. Probabilities, on the other hand, can be assessed in terms of likelihood, e.g. unlikely or frequent.
Risks can also be classified in terms of their consequences. These classifications vary by industry and may reflect goals or assets valued by organizations such as reputation, strategy, financial, investments, operational infrastructure, business, regulatory compliance, outsourcing, people, technology and knowledge, and others.
Another related term, vulnerability, often refers to susceptibility and resiliency to risks and can be quantified and displayed in a vulnerability matrix. It is notable that, unlike risk, vulnerability is not directly gauged by probabilities or magnitude of consequences of particular events, but rather by a system’s susceptibility and resiliency to certain types of risk events or consequences. (See Figure 1 for an example of such a comparison.)
Yacov Haimes, director of the Center for Risk Management of Engineering Systems at University of Virginia and proponent of a systems approach to risk management, summarizes the concept of risk management as an approach that qualitatively and quantitatively answers the following questions:
- What can go wrong? What is the chance of occurrence? Why?
- What are the consequences?
- What are the alternatives?
- What are the tradeoffs among alternatives?
- How will these alternatives affect future decisions?
What Can Go Wrong, Chance of Occurrence, and Consequences
Building a risk management framework begins with identifying what can go wrong (i.e. risk events) and assessing the chance of occurrence and consequences. Aside from traditional statistical analyses, particular risk events can be assessed based on risk management processes already in place in an organization, in terms of:
- Multiple paths to failure, which refers to the number of ways a risk event can occur
- Detectability, which refers to how effectively the occurrence of risk events can be detected
- Controllability, which refers to how much the risk event can be controlled after it has been detected
- Reversibility, which refers to the extent to which the consequences of risk events can be reversed
- Duration of effects, which refers to the time until the consequences can be stopped or reversed
- Cascading effects, which refers to the capability of some risk events to trigger other related risk events
Event-Tree Analysis (ETA) – an inductive reliability analysis tool resulting in a graphical representation of precedence relationships among initiating and succeeding events. ETA graphical representations resemble, and may be referred to as, event sequence diagrams, master logic diagrams, and reliability block diagrams.
Probabilistic Risk Assessment (PRA) – a methodology that describes risks in a complex system using probabilities and magnitude of consequences. Event-tree analysis, and event sequence diagrams are some of many tools used in PRA to describe the sequence of events leading to a risk event.
Cost of Poor Quality (COPQ) – a cost estimation method popularized at IBM which defines the costs of imperfect processes, products, and services.
Scenario Analysis – a problem analysis process which explores alternative possible outcomes (or scenarios) of future events.
Failure Mode and Effects Analysis/Failure Mode, Effects, and Criticality Analysis (FMEA/FMECA) – inductive analytical method which extends an older methodology, FEMA (Failure Mode and Effects Analysis), by adding a criticality analysis that charts the probability of failure modes against the severity of their consequences, highlighting failure modes with relatively high probability and severity of consequences.
Preliminary Hazard Analysis (PrHA) Process – a system design process used to identify possible hazards, with the objective of eliminating or reducing the risks of these hazards by designing safeguards into the system.
Stress Testing – a process to test the stability of various financial institutions beyond their normal operating capacity and environment in order to observe results and identify potential failure scenarios.
Dynamic Financial Analysis (DFA) – an approach which looks at the dependencies among hazards in a financial system using various simulation and economic analysis tools rather than simple traditional actuarial analysis.
Modeling and Simulation
Petri Net Modeling for Complex Analysis – a mathematical and graphical language used to represent distributed systems and their states as nodes and the changes in those states as directed arrows.
Visual Risk Cluster – the visual representation and analysis of hierarchical risks, with categories and subcategories.
Complexity Induced Vulnerability Decision Support Systems – an approach to modeling vulnerability of systems by analyzing the interdependencies among sub-systems and the vulnerabilities they induce.
Functional Dependency Network Analysis (FDNA) – a systems analysis methodology utilizing graphs and directed arrows representing functional dependencies among systems to describe risk of failure due to those dependencies
Committee of Sponsoring Organizations (COSO) of the Treadway Commission Enterprise Risk Management Framework – a matrix structure of four organizational objectives categories (strategic operations, reporting, and compliance) and eight enterprise risk management components that can be analyzed at either the organizational or business unit level.
Strategic Objectives At Risk (SOAR) Enterprise Risk Management Methodology – examines operational risk management in the context of the organization’s strategic objectives.
Balanced Scorecard – a strategic planning and management system that can incorporate identification, measurement, management and reporting of key risks to help organizational managers compare strategic and tactical risk management program expectations with actual progress toward objectives.
Cause-Consequence Analysis (CCA) – uses a synthesis of deductive cause (fault tree) analysis and inductive consequence (event tree) analysis to identify chains of events that can result in undesirable consequences and allow calculation of probabilities of those consequences to determine risk levels.
As Low As Reasonably Achievable (ALARA) Risk Matrix – a graphical tool that assesses risk as the combination of the chance that an event will occur and the consequences if the event occurs; that information can then be used to determine criteria for optimal risk decision-making.
Hazard and Operability (HAZOP) Study – a qualitative structured assessment of an existing or potential operational process to identify and evaluate potential risks.
Table 2: Next Generation Enterprise Risk Management Body of Knowledge/Tools
After risk identification, the organization must evaluative alternatives. Three fundamental approaches for dealing with risk are available. First, risk avoidance and control involves all methods of reducing the frequency and/or severity of losses, including exposure avoidance, loss prevention, loss reduction, and segregation of exposure units. Second, an organization may opt to transfer risk, e.g. through insurance or other methods. Finally, the organization may decide to accept risk, in which case, it effectively self-insures. From a practical standpoint, an organization often opts for a mix of multiple alternatives to address and/or mitigate risk.
Risk-Based Measures and Tools
Table 2 outlines a broad range of risk management and assessment options, with a brief description of each. All involve methodologies that are designed to identify the scope and formulate a credible estimate of baseline risk. We have selected three areas of focus and related tools to explore in more depth, since they are increasingly applied and utilized in the business continuity planning process: (1) modified financial tools, (2) modified engineering tools, and (3) modeling and simulation.
Modified Financial Tools
In the arena of modified financial tools, one example is risk-based return on investment (RROI), which measures how effectively the use of resources translates into risk reduction or avoidance. As described in 2004 by a team at Lawrence Berkeley National Laboratory and Carnegie Mellon University, RROI is calculated as the ratio between the net benefit of implementing a risk mitigation solution and the implementation cost of that solution and can be succinctly expressed as:
RROI = (baseline risk – residual risk) – implementation cost
RROI, as is the case with other modified financial tools, uses a traditional and well-accepted standard measure (in this case, ROI), but is further enhanced by incorporating the concepts of baseline and residual risks. It is important to recognize that risk reduction efforts typically do not result in revenue generation, per se. A positive RROI indicates that each dollar of implementation cost results in more than $1 of risk avoidance or reduction, and RROI is thus useful in establishing the point at which risk acceptance (e.g. self-insurance and investing resources elsewhere) makes more sense than investing in additional risk avoidance/reduction.
Modified Engineering Tools
In the arena of modified engineering tools, one promising new technique, which is not yet commonly used, is modified failure mode, effects, and criticality analysis (FMECA), which leverages a bottom-up, inductive analytical method and extends an older methodology, failure mode and effects analysis (FEMA), by adding a criticality analysis that charts the probability of failure modes against the severity of their consequences, highlighting failure modes with relatively high probability and severity of consequences.
This concept, recently discussed by Sim Segal, a fellow of the Society of Actuaries, is an example of how modified engineering tools can help organizations assess their capability to detect, affect, and eventually manage various risk events.
Steps in the FMECA process include:
- Define system in terms of objectives and performance parameters (e.g. produce a product within stated specifications)
- Identify and analyze failure modes (e.g. when one of the production machines is out of calibration)
- Classify failure modes in terms of their effects by criticality (e.g. non-reworkable errors may be expensive but not critical)
- Rank failure mode criticality, and determine critical failures that demand immediate attention
- Identify means of failure detection, isolation, and compensation (e.g. identify a schedule of preventive machine maintenance)
- Perform maintainability analysis (e.g. identify potential improvements)
Modeling and Simulation
One of the most promising risk assessment tools is modeling and simulation (M&S). M&S is a common tool in the areas of manufacturing, aerospace, transportation, etc., which uses computing capabilities and knowledge about a particular system to analyze and further improve its design. Nowadays, researchers and practitioners in the areas of M&S and risk alike are working together to use this approach to better manage risks. At a high level, the first step in a modeling and simulation process is to gather, map, and document everything known about a risk event and how the enterprise may be affected. The key word here is “enterprise” – it is important to break out of product and departmental silos.
Information used as the basis for modeling can be depicted visually in a chart. As an example, to assess potential impact throughput in a process, time is typically shown on the horizontal axis, and the system attribute for which risk is being assessed (in this case, process throughput) is graphed on the vertical axis. The desired attribute level for throughput (before the event, and after return to normal) is established and plotted, as are the phases of the disaster/outage event: onset, detection, control, and termination.
A different scenario may involve assessing risk related to a geographic range of effects. In this case, information translated into diagrammatic form includes the point of incident, initial inoperable space (e.g. radius), and secondary inoperable space, all of which can be dependent on factors such as system operation, type of incident, incident management, and the emergency response plan.
The second step in the modeling and simulation process is to develop and refine a model, leveraging technology, which allows running a range of simplistic to robust “what if” scenarios and enables complex situation analysis.
The final step is to assess the results of modeling and simulation to develop insights and to translate those results and insights into visualized decision aids for risk analysts and managers, including senior management and decision-makers.
Risk Management Tradeoffs and Effects on Organizational Decision-Making
From a business management perspective, there are important, widely recognized challenges associated with enterprise-wide risk-based business continuity analyses which use modified financial and engineering tools or M&S. First, obtaining accurate cost and benefit estimates can be difficult. Risk assessment is about much more than cost. It also involves estimating the benefits that can be realized by reducing expected risk. As is the case with other business investments, the incremental benefit of every dollar invested is an important criterion in enterprise-wide, risk-based business continuity analyses.
Second, the process of developing optimal solutions is critically important, requiring out-of-the-box brainstorming and accurate assessment of viable alternatives. Clear identification of scope and credible estimates of actual risk and exposure are required in order to avoid “garbage in/garbage out” results. The potential effectiveness of solutions, particularly complex, integrated solutions, is difficult to assess, even with the support of powerful tools like modeling and simulation. It is important to recognize that the “How big is the iceberg?” principle applies to in-depth risk analysis, which tends to bring to the surface more critical risks and potential failures than appear with a superficial assessment. Those additional complexities must be identified and then integrated into the assessment.
Finally, the fast-changing environments of the typical organization and its broad, complex, and dynamic network of stakeholders make risk assessment an effort that attempts to hit multiple moving targets.
On a positive note, risk management benefits organizations in ways that transcend risk recognition and resulting mitigation or acceptance. The resulting insights are as (or even more) important than the risk-related results. Over and above the obvious rationales for risk assessment – achievement of business continuity/resiliency, insurance, regulatory, and stakeholder benefits objectives – there are side benefits. Risk assessment shines a bright light on current systems and functions, exposing opportunities for corrections, cost savings, and redefinition/update/modernization of operational processes and realignment with organizational goals and objectives.
Today, more than ever before, there are significant opportunities for organizations to customize and leverage existing and emerging tools to optimize enterprise-wide risk management. Research, education, and outreach from universities, governments, and industry/professional organizations related to next generation tools and techniques provide opportunities for synergistic collaboration that supports organizational risk managers. As practitioners and researchers, an important initial question to ask is, “Which tools and techniques, with which I am familiar, can be leveraged to address my organization’s risk management needs?”
C. Ariel Pinto, Ph.D. (firstname.lastname@example.org) is an assistant professor of engineering management and systems engineering with Old Dominion University and a former research fellow at the Software Industry Center at Carnegie Mellon University and the Center for Risk Management of Engineering Systems at the University of Virginia. His research interests include engineered systems of project risk management, risk valuation, risk communication, analysis of extreme/rare events, and decision-making under uncertainty.
Theresa A. Kirchner, Ph.D., MBCP (email@example.com) is an assistant professor of management with Hampton University and a former senior vice president with Bank of America and principal consultant with Keane, Inc. She is a DRI International certification commissioner, heading the policies and operating procedures committee, and a former member of the Disaster Recovery Journal Editorial Advisory Board whose research interests include business continuity and strategic management topics.