In this article, I will be taking a look back at the "good old days" of recovery planning, comparing it to what is currently being done, and offering my own insights on solid recovery planning practices.
My original background is in computer technology. When I started working, it was called data processing (DP), although that did not really describe our role. So we changed our name to information systems (IS). Adding "information" to the name was a good idea because that was, in fact, the thing we provided the business. But the term "systems" sounded too focused on applications, still not covering the breadth of what we did. Now we use the term information technology (IT). I like IT because it is ambiguous enough to cover everything without being specific. Depending on the company, there are many other names used to describe the IT contribution to the corporation.
My introduction to disaster recovery planning took place in the late 70s in King of Prussia, Penn. Our CIO at the time had been to an Ed Devlin seminar, and after hearing Ed speak he realized we needed a DR plan. I had just put together our first ever operations standards manual, so my manager decided this would be my next fun project. On hearing the news I thought, "All right! Road trip!" Then another thought crossed my mind, "What is disaster recovery planning anyway?" At the time, I dismissed this thought. After all, how hard could it be? I should add, that at this point in my career, I was young and very naive.
To make a long story short, my manager and I attended another of Ed’s seminars later that year. I remember that after the first day, my head was swimming. I couldn’t believe how many things there are to consider about disaster recovery planning that I had never worried about before! As I lay my pumpkin head on the pillow that night, I brainstormed, "How can I get out of this?" The answer, of course, is that I did not get out of it. About two months later, we signed a contract with Devlin Associates to assist us in developing our plan. That is when my real disaster recovery education began to unfold.
IT recovery planning began as "disaster recovery (DR)." DR was focused completely on recovering the critical components in the data center, which at the time meant the mainframe. By the late 70s, through trial and error, most companies had figured out that they needed to have adequate back-up procedures in place. Some had even realized that data should be stored off-site, incase a disaster happened. The big gap in most plans back then was how to recover if the big one really did hit.
In the late 70s, there were not many recovery options available. Hot-site vendors were just thinking about going into business. Most IT organizations could not get the funding to build a spare data center, especially for something as unlikely as a disaster. A popular option at that time was the reciprocal agreement.
Reciprocal agreement: "Agreement between two organizations (or two internal business groups) with similar equipment/environment that allows each one to recover at the other’s location."
While a reciprocal agreement looked good in a documented DR plan, when you sat down with internal/external audit every year, it had built-in problems. What company had so much excess computing capacity that they could afford to stop processing, run their own backups, let another company come to restore their OS, load their data, and then run their systems for 8-to-12 hours? The answer is not many. Think about today’s world. What kind of security and data privacy issues would have to be overcome to make this work?
Summing up the late 70’s, recovery planning was owned by IT. Recovery consideration, for the most part, was confined to recovering IT hardware and applications only. IT managers realized that their companies were at risk; most had put in place back-up procedures; some had documented DR plans; but most had no place to go and recover. IT management would put budget requests in for DR funding, but since they did not have practical cost effective solutions to offer, their requests were most often turned down.
Because it was difficult designing and building plans with no template to work from, we did not realize that these were in retrospect the "good old days." All we had to do was plan for data center recovery, which in most shops meant one or two processors and the peripheral hardware. The concept of recovering the end users was still a few years away. After all, they could still use the manual procedures they used to use. Right?
The 80s and Beyond: A New Industry Is Born
As technology evolved and became embraced by business, each outage seemed to have a more severe impact. Some very publicized events, i.e., the Hinsdale fire and the Chicago flood made big headlines. Businesses began to realize the extent of their dependence on IT being available. Funding for DR plans, which in the past seemed like a-nice-to-have suddenly, became something that they had to have.
Having realized the need, businesses now began to look for solutions. In St. Louis, a consortium of businesses collaborated and formed a group they named the St. Louis Recovery Site Organization. This organization shared the cost of a contract with a local computer facilities company for a cold site set up. Included were a computer room with a raised floor, HVAC, required power, a number of phone lines and network connections sufficient to support the essential needs for all of the businesses involved. Hardware availability was a separate issue that each business had to address with their own vendors.
The agreement was "first come, first serve." In other words, it was a shared risk model. The way it worked was that each year, someone from one of the member companies would serve as president of the organization. In the event of a disaster, whoever contacted the president first would have rights to the facility. While it would not have provided a quick recovery, at the time this model was the only game in town, was relatively inexpensive, and it did provide the infrastructure environment needed (minus computer hardware) to shorten the timeline to be up and running. St. Louis was not unique in this approach. There were other similar groups throughout the U.S.
Just like everything else in a free market society where there is a demand, someone will come along to fill the gap. Enter the hot-site vendor: SunGard, Comdisco, IBM, and others offering recovery services. Based on a shared-risk model, these vendors offered solutions, which for the first time, provided an existing computer room where you could take your back-up tapes and immediately start the recovery process. You could even rent space on their floor and install your own equipment, complete with network connectivity … for a price.
As recovery awareness grew throughout the 80s, many local and national recovery-planning groups were being formed. Then, as now, these organizations provided an opportunity for practitioners to get information and share their best practices. Trade publications were started, with DRJ being the first. Software developers provided planning tools to make documenting plans easier. Vendors and entrepreneurs started hosting conferences devoted entirely to recovery planning. DR had become big business.
Businesses began to realize that building a recovery plan was not just a project that someone ran to completion. It was an ongoing process that required qualified full-time attention. Recovery planning was destined to become a career path. The industry recognized that a standard was needed to insure the core competency of a recovery planner. This resulted in the formation of the Disaster Recovery Institute International, and a certification process was established.
Business Continuity Planning
Now that there were practical solutions in place for data center recovery, the focus shifted to "how do we connect those pesky end-users?" DR planning was becoming a part of business continuity planning, starting a new search for solutions, and it seemed like everybody had an idea for how to do that.
Some businesses worked out reciprocal agreements with other companies. Some companies built out space at other locations, which they owned. Of course, the recovery vendors had solutions. Vendors offered work group recovery, in office space already set up (you travel to their site), and in mobile units that could be delivered to your location. They included PCs, telephones, printers, fax machines, servers, and almost anything else deemed necessary.
Through the 90s to the present day, the recovery planning industry has grown and matured. It has really been quite amazing to watch it expand.
What I’ve Learned Along The Way: Basic Building Blocks
Today’s challenges are much more complex than those we faced just a few years ago. But the basic building blocks for a good recovery plan have remained the same.
Step 1– Gather and analyze the business requirements.
Step 2– Design a cost-effective solution, based on the business requirements.
Step 3– Gain funding.
Step 4– Build the plan.
Step 5– Test the plan.
Step 6– Maintain the plan.
It is really that simple. These six steps made up a good recovery plan in 1979, and they all still apply today.
Step 1: Gather and analyze the business requirements
Back in the day, requirement gathering usually went as far as the head of DP. After all, what did the business know about recovering a mainframe? In actuality, they know quite a lot, they know what they really need. Unless you engage your primary customer, the business owner, you are very liable to design a solution that they neither need nor want.
Conduct a business impact analysis (BIA). There are many software tools available to make this task easier, but you can get by with a Word document or spreadsheet. The three important parts of a BIA are: first, make sure the questions are adequate to identifying what the real business requirements are; second, be sure to ask the right people (I have found this to be a combination of department managers and the people who actually do the work.); third, do not ask more questions than needed. These folks have a job, too, and asking more questions than necessary could slow down how quickly they respond. Do your best to keep it simple.
Step 2: Design a cost-effective-solution based on the business requirements
Designing a DR plan in the late 70s was much easier than coming up with solutions for the sophisticated IT and business atmospheres we have today. Involve individuals representing all areas that will be included in the final plan. Do not try to design a plan in a vacuum. You may discover that you do not have all the answers. Make sure the design meets the business requirements and is cost effective. If you do not, then step No. 3 will be a problem. Did I mention to keep the design as simple as possible? It will be easier to implement and less likely that the plan will break.
Step 3: Gain funding
Depending on your personality, this can either be a step you look forward to or a step that you dread. Recovery planning does not make money. While it may end up saving a ton in the long run, it is not a revenue generator, and management views it as a cost. As sexy as DR planning sounds, this is one of those items that seems to get trimmed or cut out when funds are tight.
It is a good idea to start selling your plan long before the day you are trying to get approval. Do things like sending out an article about a fire, flood, tornado, etc., which has impacted a competitor. If you have an outage, estimate the impact and share that information with management. Write an internal newsletter article on disaster risks. The obvious target audience is, of course, the people whom you need to approve your funding. However, promoting awareness is something you should be doing for everyone involved.
Other good selling points for developing a plan may be found in regulatory and compliance legislation, i.e., the Foreign Corrupt Practices Act of 1977 (FCPA), Health Insurance Portability and Accountability Act of 1996 (HIPAA), and Sarbanes-Oxley Act of 2002 (SOX).
Step 4: Build the plan
This is probably the easiest part of the planning process. You know the business requirements, you have designed the suitable solution, and you have the funding. Now all you have to do is document, in a logical flow, the plan execution. There are many helpful software planning tools available today that make this process easier, but you can build a very effective plan in a Word document, as long as you include all the needed information. There are a gazillion different plan templates available that can be used. Find one that resonates with you and tailor it to meet your needs. When documenting, again, keep it simple!
Step 5: Test the plan
Recovery planning is an ongoing process. Unless you test the plan there are no assurances that the plan will work.
Tests should be made as realistic as is practical and should include a documented test scenario, objectives, and expectations. If possible, different personnel should be rotated in from one test to another. This ensures that others, as well as the person who wrote them, can understand the procedures used, and it gives everyone the experience of how a real recovery would unfold. Throughout the duration of the testing keep a log of all issues encountered. When the testing is completed, the issue log should be reviewed and all issues followed up on through resolution.
I know that many CIOs like to see measurements and have a tendency to view recovery testing as a pass/fail kind of thing. My view is that all recovery tests are successful. The reason you test is to identify any gaps in the plan in order to correct them. What you want to see is steady improvement in each test. If you had five major gaps identified in the first test, you don’t want to see those same gaps again in test No. 2.
Although each business ultimately decides the frequency of its recovery testing, I recommend that any mission-critical application or process should be tested at a minimum of once annually.
Step 6: Maintain the plan
Plans should be updated after each test and whenever a change is made in the business that affects the plan. This is not as simple as it may seem. Maintaining the plan means more than just updating the documentation. Recovery planning needs to be part of the change control process. When new applications are being designed, when hardware is being upgraded or replaced, when business processes are being changed, the impact that these may have on the recovery plan has to be considered and addressed. Depending on the magnitude of a change you may find your plan back at step one or two. Did I mention already that recovery planning is an on-going process?
About Planning Tools
There are many very slick planning tools available for conducting BIAs, plan building and Web hosting, automated notification, etc. These products can help to organize and standardize your recovery effort. Take the time to evaluate the benefits and features each offers. If you decide to purchase a tool, make sure the product meets all your requirements. A word of caution: planning tools are not silver-bullets, they are aids; they do not design and automatically build the plans for you; you still have to perform all six steps in the planning process.
The changes in technology since I first entered a computer room in the 70s can only be described as unbelievable! Initially I worked in what was considered a medium sized shop. Our two 360/40 mainframe processors each had a whopping 128K of memory. Storage was covered by twenty-seven 2314-disk drives, each holding about 29.17 MB. That is less then 1 GB total, for the entire computer room. My current home PC has 2 GB of RAM and a 130 GB hard drive on it, and I have a plug-in drive with another 60 GB. What a phenomenal change.
Network bandwidth keeps growing also. In the 90s, one or two T1s (1.5 MB) was considered a good-sized connection to a business location. Now we are talking 12 MB or more. The good news is that networks are more dependable and easier to trouble-shoot today.
The changes in technology have made recovery planning easier on one hand and harder on the other. The variety of options that are now available make it easier to build recovery solutions. The hot-site vendor is still there, plus data replication and disk mirroring to storage devices and servers at other locations are practical solutions that are not as cost prohibitive as they were just a few years ago.
Planning is more challenging now because our processes are so much more complex now. For example: you may have an application with the Web server in Boston, while the application server and database might be in Cincinnati. There may be an interface to a mainframe in Alpharetta and end-users from around the globe may need to access the application. You would need a plan in place to recover each of the components, and that assures the database in Cincinnati is in sync with the mainframe data in Alpharetta.
A recovery practitioner needs to have project management proficiency, the ability to matrix manage diverse groups of people, and needs to possess the communication skills that will be needed to sell and lead the recovery planning program.
Recovery planning is not rocket science. It is common sense. Remember, the primary objective is not to provide an elegant solution; it is to provide a solution that meets the business needs. David M. G’Sell, CBCP, is process control manager with GE Commercial Finance – Capital Solutions. He has more than 25 years of experience in building, developing, and implementing disaster recovery and business continuity plans.
"Appeared in DRJ's Summer 2007 Issue"