Parts is Parts, Right?
Organizations, particularly large ones, are arranged in groups with various parts assigned specific tasks, not unlike body parts. The studies of physiology and anatomy have taken centuries to grasp a fair understanding of what some parts do, how they do it and their interdependencies. Though Aesop didn’t mention the brain, it can easily be seen as analogous to the executive function in an organization. But rather than assign departments or business units body part names (who wants to be the spleen?), suffice to say that in large, complex organizations, its structure and how its departments operate within the corporate context necessarily include the need to not merely coexist but work together to achieve the mission of the enterprise. But our fable reminds us that the behavior of each business unit is ultimately essential.
Why would there be a question about this? Because experience, observation and many stories told by fellow practitioners of business continuity relate tales of how companies arbitrarily limit their business continuity program to “key IT systems” or “mission critical” functions. To be sure, the rationalization that planning to recover everything would cost too much is readily accepted by many executives. This is how the enterprise starts down the path of the Members of the Body. No, they’re not rebelling against some departments; rather, they are simply dismissing them as expendable. That’s pretty much like a cruise ship with 3,000 passengers, who hands out life jackets to only the 300 with the most expensive cabins. At some point, the other 2,700 passengers are going to start getting nervous about their safety. Why wouldn’t the “non-critical” department managers and staff feel the same way? They develop rationalizations of their own:
- “We don’t have budget for it.”
- “With at that plan writing, exercising and maintenance, it’s too time-consuming.”
- “We’re not that critical.”
- “If we needed a recovery plan, management would tell us.”
- “When they make it part of our performance reviews, we’ll do it.”
Whaddya Mean, ‘Critical?’
“Critical” needs to be understood as a relative categorization: some operations are more vital to the enterprise’s survival than others. That means that the budget expended to protect each operation should be allocated accordingly: everyone gets a life jacket, but some also get preferred seating in the lifeboats… so to speak. To make such choices requires a clear and rational assessment of every business unit’s processes, what their role is in meeting the enterprise mission, and how the interruption of each process would affect accomplishing that mission. In other words, a business impact analysis (BIA). From the postings on the DRJ.com discussion boards, one could deduce that doing BIAs is right up there with un-anesthetized root canal surgery. Newbie posters often write “My boss wants me to do a BIA; can anyone can share their BIA reports for [my] industry, please?” Aside from the highly confidential nature of a document listing a firm’s vulnerabilities, there are more fundamental reasons why that’s a bad idea:
- Even within the same industry, competitors’ organization structure and operational process flows are different, making dependencies and relative criticalities equally different.
- Companies differ in strategy: how they choose to attain specific goals will impact the relative criticality of many operations.
- Some business processes are better designed for resilience (resistant to interruptions) than others, even before a BIA is done.
- Some companies skimp on the BIA to save money, and the results are likely to result in inadequate strategy choices, such as over-spending to protect a less critical operations and/or under-spending on a truly critical operation.
The bottom line is that BIAs simply need to be done, and well done at that.
BIA questionnaires are intended to determine three key attributes of business operations:
- The maximum acceptable outage (MAO) of each business operation (process), along with identifying specific resources essential to recovering the process after a disaster event has damaged or prevented access to or use of those resources.
- The recovery time objective (RTO) is how quickly essential resources must be made available after the event. For IT systems supporting the process, the RTO is the basis for developing the recovery solution architecture.
- The recovery point objective (RPO) is how much data loss from the process can be accepted, whether paper records or electronically stored data, is. For IT systems, this also influences the recovery solution design.
In reviewing commonly available BIA questionnaires, there is often little or no mention of any other resources vital to the operation. For example, in a manufacturing operation, where component parts are made, then assembled, soldering irons, torque wrenches, jigs, fixtures and other specialized equipment are vital to the production effort. Not accounting for the loss of use or access to those resources would cripple the development of any meaningful resumption strategy, since planners would have no input about what special equipment, key staff and vendors, replacement facilities requirements, et cetera. Staff availability alone is now a hot button topic in the ominous shadow of an influenza pandemic. It’s as though these parts of the body aren’t important enough to be included. Yet our fable makes it clear that, sooner or later, any absent or non-functioning member can bring the rest down. This kind of oversight is understandable, given that the preponderance of BC practitioners work in or come from IT services organizations. But this must be overcome.
Poor Brother Aesop
We must forgive our ancient story-telling slave his limited knowledge of physiology and anatomy. This is not a criticism of the analogy in this fable, but rather that the Brain isn’t part of the rebellion, likely due to lack of a clear understanding of the role that grey mass plays in managing bodily affairs. Whatever medical knowledge was available over 2,500 years ago was certainly not widely dispersed, so the brain’s “executive” capacity was easy to omit out of ignorance. Yet, are business continuity practitioners any better off? When the lion’s share (but that’s another fable) of practitioners are part of the IT department or an IT services consultancy, the oversight of BC program creation will be limited by how the corporate culture shapes it at that level. If the IT group is powerful, it will guide the process according to their perception of what (or who) is critical. When Operations (the revenue engine) is strongest, their view deems many “less useful” functions as having little or no need for resumption planning at all. It’s as though the Hands, the Mouth or the Legs have been handed control of the entire body. We’ve seen how well that works.
As is commonly agreed, the greatest challenge to establishing an effective program is gaining executive support, meaning management at the CEO/CFO level making an enterprise-wide announcement endorsing the development of a corporate BC program, augmented by policies for the entire organization mandating active involvement in program development and implementation. This means engaging “the brain” in a leadership role; yes, the CEO must be known by all to be fully involved in the program. Yet, why is gaining his/her visible support such a challenge? Perhaps, it’s because the strategic implications of forging a resilient operation have never been suitably explained. The enterprise doesn’t operate in a vacuum; rather, it is much like a person who must cooperate with some others for mutual support, while competing with others to achieve whatever is deemed to be success. While this notion may seem easy to grasp, connecting enterprise resilience to competitiveness often proves … difficult. It needn’t be.
Within any given industry, a group of firms compete for market share, sometimes in local or regional markets, others globally. But all are exposed to the risks inherent with doing business and operating in specific physical locations:
- Natural disasters – wind storm, flood, earthquake, et cetera;
- Man-made events – labor action, workplace violence, arson, fraud, ethical misconduct, et cetera;
- Business risk – opportunities to gain or lose market share or enter new markets, regulatory compliance, product liability issues, et cetera.
Some of these are addressed by preventive and avoidance measures: choosing new sites away from known threats, proper screening and training of staff regarding legal and policy compliance, maintaining a positive, agreeable environment for the workforce. Others simply require that plans be developed to enable timely recovery of operations after unavoidable and/or unpreventable interruptions. Failure to establish, test and maintain such plans leaves a door open for competitors to exploit such events, gaining market share at the least, and possibly becoming a wholly-owned subsidiary of the more resilient competitor.
Connecting the Headbone to the Neckbone, etc...
Could it be any clearer? Does any CEO not want to be the “last man (or woman) standing,” so to speak? Enterprise survival isn’t just a matter of market segmentation schemes, investor coddling and ad campaigns. Using a warship as an analog, a hole of any size, when below the waterline, can eventually sink the ship, unless appropriate action is taken. But that’s not the only scenario: all that need happen is to take on enough water to hinder seaworthiness – the ability to make way and do battle. The risks to enterprise viability are all around: the captain cannot simply declare that the only the engine and munitions rooms are critical, as though the rest of the ship can sink but them. Sound ridiculous? It shouldn’t; a large percentage of firms still only have plans for recovering their IT infrastructure. Others buy business interruption policies to stem the loss of revenue from an interruption of operations. Problem is, even if a claim were paid within a month or two of the event, those policies aren’t going to manufacture and deliver a single product or service offering (where the revenue comes from).
All Risks, All Threats, Critical Impacts, All the Time
Taking care of the enterprise to ensure resilience is a lot like the practice of modern medicine: controlling one’s diet, doing suitable exercise, avoiding dangerous behavior and potential threats, all without becoming paranoid. But the beginning of such a program is much the same: a comprehensive physical. For the enterprise, that means a thorough risk evaluation and an analysis of the impact to the enterprise resulting from interruption of any process.
One might posit: “Well, we can certainly do without HR indefinitely…”
Really? If key staff members are lost or unavailable after a disaster, who would be called upon to quickly identify replacement candidates? Skimping on which processes are deemed critical without adequate impact analysis conflicts with a basic tenet of management: if it’s worth staffing and funding, it’s worth planning to recover… eventually. This means separating the below-the-waterline operations from those which aren’t, with more aggressive recovery plans for the first, and deferred for the rest. This brings up a practical issue every continuity plan must address: who goes first?
Triage for the Enterprise
Many plans nail down a list of priorities, most notably in data center recovery plans, but also in operations BC plans, too. That’s nice, but plans are written around a chosen strategy based upon the “worst recoverable case” scenario: losing (access to) everything at the worst possible time. But operations have different criticality drivers: some are calendar-driven (e.g. payroll, financial reporting, tax filings) while others are event-driven (e.g., new product launch, rollout of new manufacturing technology). Another way of stating it is that plans are theoretical… until the disaster happens. Up to that point, all operations must be considered equal. Only after the event will the recovery management team leaders know the relative criticality of operations really is. So the damage assessment process must provide them with actual impact data for establishing the recovery action plan, i.e., the priorities based on what is no longer theory, but reality. If this is difficult to imagine, consider: when is a better time for a disaster, Friday evening, when nearly everyone is out of the building, or 9 a.m. Monday? In the first scenario, there’s a whole weekend to sort things out, decide on priorities and get the hot items up and running by Monday. In the second, staff may well be injured or worse, not to mention traumatized, and every minute is operating hours lost (think immediate sales, revenue and productivity impact). With guidelines for the recovery leaders, this process enables an effective and measured response, scaled to suit the scope of the event without under- or over-reacting.
"Appeared in DRJ's Winter 2008 Issue"