I have talked to many organizations about their recovery programs, and an observation that I have made that seems to be a common thread is that the responsibility for recovery seems to lie in several different (and often uncoordinated) department. It begs the question, who is responsible for business recovery within a company? I’ve seen the Business Continuity group report to the CFO, the Risk Officer, the COO and the IT department. But structure and reporting does not equal recovery.
True business recovery can only take place when it becomes part of the culture of the company. Basically everybody has a part and everyone knows their part.
Like a championship team, a good business continuity plan takes a group of people who each know their role and can work together. And that does not mean that every one is equal to the next, leadership and accountability are important parts of the successful equation.
As a starting point let’s agree upon some basic concepts:
Business Continuity is the overarching structure, process and discipline of assuring that a business has a plan to resume operations in the event of loss of resources resulting in an unacceptable slow down or loss of business operation.
Disaster Recovery is the process and structure for recovering IT assets and services to the business.
Crisis Management is the process and structure evoked to execute on a structured plan to assist in managing a recovery process.
Risk Management is the process and structure, and responsibility of ascertaining and mitigating risks. In most cases this includes a wider definition than mitigating risks associated with a business disruption due to a loss of resources.
Audit and Compliance is the processes and structures that are used to validate that certain business process and functions are being followed.
One way of looking at the relationship of these groups within an organization is depicted in the following graphic:
Disaster Recovery and Crisis Management are disciplines within Business Continuity Management. Risk Management overlaps the responsibly of Continuity Management and Audit and Compliance would have an overlapping responsibility across all risks areas with in the company including Risk and Business Continuty.
Coordinate the efforts of Business Continuity group, the Information Technology and the business units.
An architect or a designer would start a project with a vision of what their end-state. In business we seem to follow a process or framework and focus more on the steps than on the end game. My suggestion is to take the best of both of these methods as your implementation approach. Whether you are at the beginning of your program or you are well into the program and looking for places to improve, following a structured process and having a specific end-state as a goal provides a more positive outcome.
Each of the different groups in your company, like the members of a team, will need to work together having both individual assignments and a team goal.
The Business Continuity team creates and coordinates the plan, the business units participate with their individual recovery solutions, the IT Department has the responsibility of technology, and Audit and Compliance group helps to identify gaps and direction.
The Business Continuity Group needs to create the plan. This is not to say that it is their responsibility is to write all of the recovery plans. They should help establish:
The overall plan with the long and the short term goals - The project plan detailing the steps that will be taken to build or expand the program. Include resources required, an estimate on the time commitment of those resources and any additional budget impacts that need to be considered. The program plan will be a key component to the justification or business case that may need to be developed to assure continued executive support.
Establish the framework for the business units to use for building their own continuity plans -The Business Continuity group should not take on writing the resumption plans for each of the business units, departments or facilities. However, they should provide the structure and the framework for the plans and supporting documentation.
While each plan does not need to be exactly the same as another, there should be similarities in the context and framework from one plan to another. The Business Continuity group should set guidelines for the content and an appropriate level of response based upon which resource is lost, and the severity of the loss. They could also make sure that there is continuity in the plan development, communications protocols, and exercise objectives.
The Business Continuity Group will also define the communications plans that go along with the crisis management plans (this may require an interface with corporate communications and corporate legal). Some events are obvious and will have a prescriptive response, while others will be more subtle and will rely on the processes established to assure that the right response is being executed.
The Business Continuity team needs to assure that the communications plan is put into place to declare a disaster, communicate that a plan has been put into place, provide ongoing communications, and coordinate activities.
The Crisis Management plan developed by the Business Continuity team coordinates the activities of several groups. As an example of this is the communications between senior and local management and the employees. It also establishes the activities for event lifecycle management (declaration, assessment, response, return-to home etc), recovery team coordination and activity prioritization. The plan itself is embodied in two specific parts; a strategic document describing the interaction between groups and a tactical set of instruction much like a project plan.
Define the recovery scenarios and severity levels to establish the response protocols - Business Continuity planning is having an alternate resource plan in the event that your primary resource become unavailable; focusing on the five types of resources (people, facilities, technology, machinery, and transportation) each of these will have an integrated plan that takes into consideration the severity of the loss and the expected length of the outage.
As an example planning for a flu outbreak is really all about understanding what to do when you have a ‘people resource shortage’. You have to define your critical people resources and have a plan to cross train others to do their job and know how long you can operate without there specific service to your company. In doing so you have defined the parameters of the outage who and/or how many are out and how long the outage will last. You have in fact identified and stratified the outage possibility and determined your recovery plan based upon the severity.
When you create the disaster definitions and severities it gives you two outcomes; it sets the expectations of what you will do when an outage occurs and it also implies what you won’t do. It helps determine when an outage is really a disaster and by nature of the definition outlines an appropriate response.
Define the measurement and control points of the program - with the program underway, an important communications point is to articulate the progress that the program is making. While some common indicators are the number of mechanical outcomes (completed BIAs, completed plans, etc.) other dimensions could be the amount of risk mitigated, the preparedness of the staff at a location, the success of the last exercise, or a self evaluation by the senior manager in charge of the facility etc. While these types of measurements are less binary, and certainly more subjective, they also help to really define the existing risk of not recovering due to a lack of understanding or lack of preparedness by the people in a given location.
Be consultative to the business units - the Business Continuity group generally has three main responsibilities when it comes to developing or refining the business continuity program: 1) Define the framework and governance of the program; 2) Validate and measure the results of the program and 3) Be consultative to the user community, build their awareness level, their adoption of the program and their maturity within the program.
Establish short and long term goals
Building and maintaining a business continuity program is not a project; it is a change in corporate culture. Or, another way of saying this is ... “It’s not a sprint, it’s a journey“. Establish your goals on both a long and short term scale, six month to one year increments work in most organizations. Make sure the goals are measurable, attainable, and easily communicated. If you are implementing a new program, developing the charter and the framework of the program are probably good short term goals, and implementing communication and awareness programs are good long term goals. In an existing Business Continuity program, identifying the maturity of the program and comparing it to the risk tolerance of the company is a good place to start. Generally this type of assessment identifies gaps that identify measurable changes to the program.
Make sure it works
As I have looked at many of the publications, instructions, processes, and methodologies for implementing a business continuity program, I have noticed that you could pay a lot of attention to the activities in that program and not produce a result that changes the recovery posture of your organization.
One Business Continuity manager that I talked to had spent over a year working on their risk analysis. They had identified a pretty impressive list of risks and the potential threats; they had researched the historical occurrences of each of the threats, and went into the root cause of several of the occurrences and then assigned a probability of occurrence to each of them based upon the likelihood of their company experiencing the same type of event. At the end of the year a pretty significant document was created, it was rich in facts and details; however, it could not draw a specific conclusion or support any of the program recommendations. Many disasters are unpredicted events that cause an unacceptable loss of resources to your business.
Put into simple terms, if you rely on your car for transportation you would know that keeping up on the normal maintenance schedule is a best practice. But could you predict when you would get a flat tire? I suppose the next question would be when was the last time you checked to see if the spare was road worthy? Flat tires are a threat to your transportation resource, and the spare tire mitigates the risk of loosing that resource for that reason. But the event is not predictable.
Methodologies, frameworks, and standards all provide a set of guidelines to best practices in building out your business continuity program, however a common mistake is focusing on the process and never looking to see if the activity you have embarked upon has made a difference for you company. There is a time for processes, and a time for practicality but most of all make sure your time is productive.
Having a tool to automate a repetitive process could ultimately save time and money, providing the acquisition and implementation cost of the tool doesn’t out weight the financial benefit it provides. I have looked at many of the tools in the industry, some automate a process that is already automated, and others take a methodology and automate that methodology and then require you to adopt "their way of doing things". Tools don't need to be complex in order to be useful, they just need to be able to economically assist you in getting a job done.
Software tools are generally sold on feature and function with some razzle dazzle demo that goes along with it. That is not going to change. The question when selecting a tool; is what are you expecting it to do? Evaluate what you want the tool to do, and then look at alternate methods for doing it, look at the level of customization that the tool requires and finally the benefits that tool brings. Have you ever wondered why there are so many different kinds of hammers in the tool isle at the hardware store? And just how many different ones do you need to pound in a nail?
I have seen companies start with buying a tool and building their program around it, and I have seen companies build their program and then buy a tool to support it. Unfortunately both approaches have had the same number of successes and failures. Success stems from understanding what you need to automate and what compromises you are willing to make with the tool that you select, and ultimately the benefit that the tool brings to your program. In your evaluation of any tool set, ask to talk to the customers who are having difficulties implementing the tools also, not just the reference customers who seem to be the poster child for the company.
Installation vs. Implementation
Business Continuity tools can be complex, especially for larger organizations. Having a tool installed is simply setting up the computing platform, configuring the software for access, and making a few modifications so that the users can access the tool. Unfortunately the end result here doesn’t get you to the razzle dazzle demo that sold you on this tool in the first place.
An implementation plan for a tool includes understanding what you want automated, what you are expecting as a result, working with the tool vendor on the requirements and understanding what happens after the tool is installed. There will be configuration and data that are required to make the tool useful, there may be some customization, there will certainly be a need for understanding the processes of the tool and training for you and the end user community. All of these elements makeup the implementation process and bring you closer to a useful and productive tool in your program.
The final question to ask the tool vendor is whether or not it links to the other parts of the business such as Risk, Audit and Compliance Management. Without those linkages it is difficult to truly understand the business impacts and auditing over time. It is also important to ensure it does more than just business continuity. The tool should also bring in other elements such as crisis management, disaster recovery planning and integrate with other solutions such as calling tree products.
Focus on the results
One of the requirements that should be considered in any tool set, is the ability to identify the control and audit points in your program and understand the level that each location or department has executed upon that control. As an example if there is a requirement to review business impacts of an outage on a bi annual basis, then departments who have completed that task are compliant and those that have not are out of compliance. If this requirement is dictated by a standard framework that the organization must comply to, understanding the exposure that exists because of non compliance is an important and measurable risk. If each location and department were required to meet this requirement in locations all over the United States, collecting the data on compliance can be a daunting task, or it can be a function of an automated control point in you business continuity management tool.
While the cultures of many companies have separate organizations responsible for different tasks, such as disaster recovery, crisis management, risk management, audit and compliance and business continuity there are working management structures, processes and tools that can help a company have a coordinated approach to changing the recovery posture of their organization.
John Linse leads the Global Competency for Data Protection Service for EMC Consulting. He leads a practice that focuses on providing recoverability and resiliency to EMC’s customers across many verticals and has assisted EMC’s customers in developing their recovery strategies focusing both on their business and technology recovery initiatives.