The recovery time objective (RTO) is a dynamic number, one with several implications. It not only tells us when we have to be back in business, but it also implies how we should develop and write our plans.
To understand the RTO, we should first understand what happens during a disaster (from a financial point of view). A disaster is an event that takes one or more productive resources off line. It may be staff, computers, communications, facility, etc. So, what is the rush to get the organization back in production? From the financial perspective, when a disaster occurs, income stops, many of the ordinary expenses continue (facilities expenses, salaries, etc.), and at the same time many extraordinary expenses are incurred (temporary housing, hot-site fees, replacement equipment, etc.). The effect of these three factors is a decline in equity position (net asset position for non-profit organizations).
I lived in Miami during Hurricane Andrew. What I saw was that half of the organizations in the city did not survive the hurricane. Of those that survived, half of those were dead after three years, a result of a weakened equity position.
One of the first things to understand when writing a plan is, what is the amount of equity that, if lost, would cause severe damage to the organization? Generally, the CEO and CFO of an organization will know the answer to that question. It is essential that you get this information if you want to do a realistic and effective recovery plan.
Armed with the critical, equity amount, you can conduct the business impact analysis (BIA) with more authority. When you get to the part of the BIA where you ask how long the organization can survive without a particular business unit, you can use the drop-dead equity loss amount as a standard. It is your job as the questioner to push the RTO time horizon out as far as reasonable, and is the interviewee’s job to bring the time horizon back based on his/her intimate knowledge of the business and their business unit. Once you arrive at an RTO, then you are half way to the strategy that you will use for recovery.
For a profitable enterprise, the longer you go out in time without producing a product, the greater the expense. Conversely, the quicker the desired recovery, the more expensive the solution. The above graphic is a depiction of that concept.
In its simplest form, every organization has the same structure: something comes into the organization, resources within the organization are applied in order to transform the input (staff, facilities, equipment, IT components, etc), and an output is produced and sent to a customer. When a disaster occurs, one or more components of production are removed from the production process so that output is no longer possible. The recovery time objective is the time by which production must resume so that the organization’s product is going out the door to the customers. For this to happen, the broken components of production must be fixed or replaced.
To accomplish this recovery, we create a recovery plan. The plan needs to enhance our ability to recover the organization before the time allotted by the RTO runs out. In this regard, those who create disaster recovery plans (those dealing with the recovery of information technology and communications resources) seem to have the right idea. If you look at a DR plan, you will notice that there is one plan, and that each of the recovery procedures revolves around recovering a component of production (recover the AS400, recover the mainframe, recover the Cisco Routers, etc.)
When you examine the business recovery plan(s), many times you will notice first they are divided into many plans (one for each department), and two, they focus on recovering a business unit. The procedure or procedures are referred to as “workarounds” which mean that the department tries to get some type of production without the use of computers. Now, I’m sure that there are exceptions, and everyone reading this article will be able to cite at least one. But what I find as I go around the country doing plans is that manual (non-computer) procedures for getting work out the door are a thing of the past.
Back in the 1990s I was working for a company that had a fire in their main office. They were able to recover by moving their staff to a nearby motel and having their staff work out of several rooms in that facility. No computers were required.
Several years later, that same company would not be able to repeat that feat because the processes they had used in their recovery effort had all been computerized and no one remembered how to do the processes manually. This has pretty much been the evolution of business processes throughout the 1980s to present. This gets me back to my point that using workarounds to recover business processes, increasingly, will not get the job done in recovering the business.
From what I have seen, workarounds are about collecting inputs so that when the computer system is reactivated, normal business processes can resume. In doing this, not only does the organization risk losing or misplacing data collected in this makeshift procedure, but they incur excessive expense recreating an environment capable of sustaining these efforts.
Now, getting back to the examples learned from disaster recovery methodologies, by designing business recovery procedures to recover resources (as opposed to recovering departments), the plans that were created might actually allow the organization to achieve its RTO by focusing on fixing or replacing broken components of the business.
When we choose to focus on recovering resources, we can form teams to recover the resources, and we can associate vendors to provide replacement resources or to repair damaged items. We can engage those vendors and achieve an understanding of expectations in the event of a disaster situation (this is the component that makes a plan most effective).
In articles on business continuity, I very rarely see any comments or discussion of procedures. It is the procedures that are tested. It is procedures that are the very heart of any plan, yet it is this component of business continuity planning about which pundits rarely talk.
Procedures should be designed to bring an organization back into production before the RTO is exceeded. If a workaround procedure accomplishes this objective, then by all means, it should be part of the plan. However, the majority of the workarounds I have encountered do not move the organization toward recovery. The procedures that do help an organization recover prior to the RTO deadline address the recovery of the organization’s supply chain components.
Each of the resource categories (below) critical to the production process should have an associated recovery procedure and a team trained in performing that procedure. For these procedures to be efficient so that they can meet their RTOs, they should have certain characteristics:
- Each facility should have a single document with its own set of procedures and command staff.
- Teams should be assigned to the execution of procedures.
- Procedures should be developed with input from those charged with carrying out the procedure.
- Tasks within the procedure should begin with an action verb.
- Vendors and resources should be linked to procedures.
- Emergency and recovery procedures should be place at the front of the recovery manual and not proceeded by a collection of non-critical information.
By constructing procedures in this manner, both the disaster recovery and business recovery plans would be aligned into a single structure. Working as a cohesive recovery unit, teams could recover the various components of production with a common methodology which would lend itself to a more efficient and timely recovery.
The recovery time objective (RTO) is the driver of efficient recovery planning. It is the standard by which a recovery plan should be judged.
Jim Barnes, CBCP, MBCI, has more than 20 years of extensive experience in business continuity planning. Barnes was in charge of designing business continuity planning software that was marketed and used internationally. Most recently, Barnes assisted in the design of a business continuity certification course which he taught in Europe, South America, and the United States. Barnes has written more than 300 business continuity plans for a variety of institutions. At least 12 of his plans have been successfully exercised during disaster situations.
"Appeared in DRJ's Winter 2009 Issue"