It took some time, and for many industries not ‘til after the events of Sept. 11, 2001, to realize that it really did not matter if we brought back the data center if their were not business people to use it. The people who run data centers sometimes forget that the only reason we have data centers is for the business to run and that if it were not for the business, the people who make the money, we would not need the data center.
To convince leadership of the need to build a viable business continuity plan you need to help them understand the risk they are accepting by not having one and the cost to the corporation if a disaster were to occur. The risks to the corporation are financial (how much money the corporation stands to lose), reputational (how badly the corporation will be perceived by its customers and its shareholders) and regulatory (fines or penalties incurred, lawsuits).
Financial risks can be quantified in many cases and are generally used to help determine how much money should be spent on the recovery program. You are most likely only to be able to defend spending the amount of money that is actually at risk from an event. One of the ways financial risk can be calculated is using the formula (p*m=c).
Probability of Harm (P): the chance that a damaging event will occur times the Magnitude of Harm (M): the amount of financial damage that would occur should a disaster happen equals Cost of Prevention (C): the price of putting in place a means of preventing the disaster’s effects.
Reputational risk is harder to quantify but it is clear in many industries that your competitor is a click away. If you cannot meet my need when I want, it is not hard to find someone else who will. There are also many examples of impact to stock price in the wake of a disaster that is not managed properly. Ask the leadership what they think of when you give them these examples:
- Martha Stewart Industries
- Arthur Anderson
Effective crisis management can be the difference between a company surviving an event and a company ceasing to exist.
Regulatory risk is clearly defined by the industry the organization is a part of; however, no matter what industry you are in, what is commonly referred to as the prudent man law applies: exercise same care in managing company affairs as in managing own affairs.
Once you have leadership buy into build an enterprise wide program, the first step to getting there is defining your team. To build a plan that actually works, you must have at least one person from each functional area of the company to assist in building a plan. These individuals will be given as series of tasks to complete in support of the program.
The following list outlines what each business contingency planner needs to have done and how often it needs to be submitted to corporate contingency planning. Business contingency planners may have additional responsibilities and deliverables specific to the individual company.
The business continuity planner should focus first on identifying the people in their functional areas that they would need to contact to assist in the management of an event that impacted their ability to do business as usual. This will result in an emergency notification list (ENL) for each team.
The next step is to make certain that all the records needed to rebuild the business are stored in a secure offsite location that would survive an event and be accessible immediately following an event. These records include both traditional records like server backup and paper files and other non-traditional records like procedures manuals, forms and letterhead.
Once you have your team and your records (your people and your stuff), the next most important step is to have each business continuity planner perform a business impact analysis or BIA. The BIA is what is going to help the company decide what needs to be recovered and how quickly it needs to be recovered. I dislike the term “critical” or “essential” in the process because no one honestly wants to be considered “non-essential.” I prefer the term “time sensitive.”
Generally speaking, organizations do not hire staff to perform non-essential tasks. Every function has a purpose but some are more time sensitive than others when there is limited time or resources available to perform them. Think about it this way, if your bank had a fire that made it impossible for them to continue to work at their primary location, as a customer, you probably could not care when they resumed their marketing campaign or ran their general ledger system but you would be very upset if they could not process your checks or deposits for several weeks.
Your organization needs to look at every function in this same light. How long can we not perform this function without causing significant financial losses, significant customer unhappiness or losses or significant penalties or fines from the regulators or courts.
All business functions need to be classified based on their recovery priority and once done, your planning team then needs to identify all the resources necessary to perform the functions. Resources include applications systems, minimum staff requirements, phone requirements, desktop requirements, internal and external interdependencies etc.
The recovery priority for application systems is also identified during this process. It is the business that decides what application systems need to come back and when based on the needs of the business functions those applications support.
Once the BIA is complete, you can then begin the process of identifying different recovery strategies for the various functions. Recovery strategies are entirely dependent on the the recovery timeframe associated with the function but may include one of more of the following:
- Self-service – A business unit can transfer work to another of its own locations which have available facilities
- Internal arrangement – Training rooms, cafeterias, conference rooms, etc… may be equipped to support business functions.
- Reciprocal agreements – Other business units may be able to accommodate those affected. This could involve the temporary suspension of non-critical functions at the business units not affected by the outage.
- Dedicated alternate sites – Built by your company to accommodate critical function recovery.
- External suppliers – A number of external companies offer facilities covering a wide range of business recovery needs.
- No arrangement – for low priority business functions it may not be cost justified to plan to a detailed level. The minimum requirement would be to record a description of the functions, the maximum allowable lapse time for recovery, and a list of the resources required.
Once recovery strategies have been developed an implemented for each area, the next step is to document the plan itself. The plan includes plan activation procedures, the recovery strategies, it documents management of the recovery efforts, how human resource issues will be handled, how recovery costs will be documented and paid for, document recovery communications to internal and external stakeholders and have detailed actions plans for each team and each team member. The plan then needs to be distributed to everyone who has a role.
The next step is to test, test and test again. When people say test they commonly think “pass or fail”. There is no way to fail a contingency test. If we knew it all worked, we would not bother to test it. The point of a contingency test is to find out what does not work so we can fix it before it happens for real. You should test you notification process using your ENL, your event management process using table top exercise with your teams and test your alternate sites to validate that they have everything you need to do your business for real there.
After each test it is important to document your results and update the plan where appropriate. Plans should be updated at least annually and more frequently if significant changes occur in a business area.
Make sure all employees are aware of the plan and its contents. Incorporate contingency planning awareness in to your new hire orientation. Conduct test with different groups. Awareness by all is the key. “Share the responsibility”
Kelley Okolita, MBCP, is the business continuity/disaster recovery program manager for The Hanover Insurance Company. She has more than 20 years experience in BCP, from data center and business-wide perspective. Okolita has served as a chairperson of the DRI International Certification Commission, International Affairs Committee and the DRI International Board of Directors. She is a well known speaker in the industry and author of various articles.
"Appeared in DRJ's Summer 2008 Issue"