Business Continuity Planning: A Case History
- Published on Monday, October 29, 2007
- Written by George Stratis & Norman Snow
What is Bellcore ?
Bellcore is a provider of communications software and consulting services based on world class research. They create business solutions that make information technology work for telecommunications carriers, businesses and governments worldwide.
Since Bellcore’s products are used daily in client operations, they need to be available and supportive should clients experience systems and operational problems in order to avoid revenue loss or additional operating expenses.
Consequently, they maintains test sites and hot line support for critical applications in order to respond to client outages.
The company's personnel are on call to provide on-site assistance nationwide, i.e., California earthquakes and floods, Hinsdale IL. central office fire, World Trade Center bombing - to assess damage and help develop methods, if possible, to avoid the reoccurrence of the problem/incident.
In other cases, their systems link into the communications network directly to assess damage and implement fixes (i.e., Common Channel Signaling).
In the final analysis, Bellcore’s disaster planning needs to be client focused, based on the criticality of national telephone service.
Geography has a bearing on their ability to respond and recover to incidents/disasters at their/client’s locations. Their locations are scattered throughout N.J. representing R&D, systems development, data processing and corporate headquarters, in Washington, D.C. which handles federal issues and National Security and Emergency Preparedness (NSEP) coordination, and in Lisle, IL where the training center is located. Given locational dispersion, disasters are not likely to devastate Bellcore’s entire work effort.
In addition, N.J.- our principal location, is geographically not in a hurricane or earthquake center, so disasters are likely to be localized to a location/building or a portion of one.
Disasters may be attributed to fire, weather, a chemical spill, or perhaps an act of terrorism.
As a result, their strategy stresses localized response at onset; lets the company's personnel familiar with the localized situation and resources handle it until or unless corporate resources or multiple locations are involved.
A planning framework that Bellcore found useful follows.
The Business Continuity Planning Framework
Exhibit 1 shows a typical six step planning cycle that may be found in a disaster planning textbook. Since Business Continuity Planning (BCP) is a relatively new concept, it is critical to gain the support of senior management and create and communicate a corporate business continuity policy which defines role and responsibilities.
Corporate business planners should direct the process since they are knowledgeable about corporate mission and priorities and have relationships already established with senior management. Since BCP is a new function to corporate planners, it may be useful to relate and contrast the BCP planning cycle to the textbook business planning process.
As a first step, risk analysis is intended to determine the impact of a potential disaster on a business. A business impact analysis is conducted to determine downside risk and impacts on the business due to unforeseen disruptions.
At Bellcore, a business unit, lab or facility is requested to determine revenue impact, expense, equipment, restoration timing, personnel issues. In planning terms, a situation(al) analysis is conducted which may include SWOT (Strengths, Weaknesses, Opportunities and Threats), portfolio, financial analysis and other key studies which form the basis for a strategic assessment and creation of a business plan.
In the second step, business resumption strategies based on the impact analysis data are created to specify priorities, triage principles that should be followed, recovery facilities, mission, organizational responsibilities, team structures and leadership.
This forms the foundation of the corporate level business continuity plan or practice that defines policy. This is equivalent to a strategic plan which sets forth the vision/mission, high level strategies, financial objectives, organizational design, etc. of the business.
In the third step, the planner must ensure and/or direct the budget and implementation requirements of the plan at the corporate and functional level. Without adequate budget and an effective implementation plan, the effort is doomed to failure. Key elements include: who is responsible for implementation, a project planning schedule, etc. This step is similar to effective corporate planning processes.
In the fourth step, procedures development is similar to a tactical plan for each business unit, lab, project or corporate recovery/restoration team. It should specify the personnel call list, critical equipment requirements, step-by-step damage assessment procedures, immediate emergency response procedures, key vendor and supplier lists, documentation, facilities and space requirements, etc.
In the fifth step, validation and testing provides assurance that the plan is workable, people are aware of and can fulfill their roles, equipment/software is restorable. In business planning terms, a tactical plan is refined through test marketing, focus groups, or through Profit/Loss results.
And finally, in the sixth step, plan maintenance assures that changing priorities have been reviewed and plans adjusted accordingly, personnel, vendor and supplier call lists are updated. In a similar way, changes in the tactical or strategic planning environment are reviewed and plans revised so that they are current and relevant.
Ongoing planning cycles are used to maintain and review restoration priorities and strategies, and to add new business projects/units as applicable.
Business Impact Analysis (BIA)
The typical bottom up approach to business impact analysis usually conducted through survey instruments or off-the-shelf software, consumes a great deal of personnel resources and time. In order to conserve this time and effort, they decided to formulate project priority criteria on a top down perspective that could be used to focus on the critical projects.
Exhibit 2 displays the criteria that Bellcore used. Through brainstorming and analysis they discovered that corporate strategies and flagship products in an R&D environment had long term impact.
Potential incidents, in most cases for them were short term in duration and generally impacted existing products versus products in the development cycle.
Consequently, they needed to define a set of criteria reflective of a short term nature and be focused on client’s immediate needs.
Using the cited criteria and a top down process (Exhibit 3), they surveyed business unit managers, who could identify the critical projects from the hundreds of projects that Bellcore offered. They identified 40 critical projects that were resurveyed to determine relative risk, key contacts and equipment, locations and critical internal services (e.g., telecommunications and computing).
The ranked project listing served as the basis of the company's project triage list for restoration and for a systematic implementation of project disaster plans to conserve resources. This allowed for minimal disruption to normal business activities. This listing and other corporate criteria became the basis for their business resumption strategies.
Prior to implementation, Bellcore needed to assign an organizational framework/responsibilities to carry out these strategies during a disaster recovery and restoration effort. Before continuing, the company benchmarked the proposed process with a partner identified by their external auditors. Benchmarking indicated that although a democratic/participatory style of management is the goal in todays’ business environment, command and control is the recommended style during a disaster.
People need to know roles and responsibilities of all involved, and who is authorized to make decisions. Exhibit 3 depicts the hierarchical structure to be used during the disaster recovery and restoration effort.
At Bellcore, incidents are expected to be site specific so that most recoveries will be handled by a locational response team. This team is led by the building operations manager, staffed by a team of specialists and supported by a local site executive, should spending exceed predetermined levels or project priorities change in real time. Through benchmarking, it was determined that the functional level most familiar with the effort and site should be empowered to lead the recovery and restoration rather than simply relying on the local site executive.
The Business Continuity Team includes senior management should a significant incident occur. Given the assumption that the incident will be localized, it is expected that this team will be minimally involved unless the impact of the incident is greater than anticipated. Reporting to the locational response team are restoration and support teams to assist in the recovery / restoration effort according to their specialty area.
While command and control structure will predominate during a disaster, prior to the incident (as recommended by their benchmarking partner) each team member needs to be involved in the creation of the project teams’ recovery /restoration plan.
The firm conducted planning workshops with each team to build team cohesiveness and to empower them to create and own their plans. As planners we served a dual role as experts and facilitators to guide project personnel through the planning effort, stimulate a discussion of alternatives and simultaneously stay out of the way. Our technical counterparts also supported the effort with experiential information gathered in creating and testing EDP plans.
In making the meeting a workshop with real results and standardized plans, we utilized a PC software package available from our hot site vendor and a LCD display so that all project personnel could see the plan be created and make changes in real time.
Recovery / Restoration Team Plans were made available on standardized templates. We used Group Decision Support Systems(GDSS)or groupware in a focus group format so that the template reflected the teams’ viewpoint or concerns.
Templates for projects utilizing private labs or mainframe systems were created based on existing EDP recovery plans and experience gained in working with pilot projects.
The templates included standardized stages, sections and tasks to be followed during a disaster recovery/restoration effort. Again, our benchmarking partner confirmed the need for standardized plans and encouraged the use of PC tools for ease of updating and maintenance.
We tested all three templates with prototypical teams before initiating a broad cutover. Our experience has concluded that the learning curve has been substantially reduced by using tested templates. This will ensure the rapid development of future project plans.
In order to assure that the plans were viable (i.e., no disconnects between or within team plans) and to reinforce the roles of each team in the recovery/restoration effort, disaster drills were conducted in the form of walk-through exercises.
The drills were based on a common disaster scenario, e.g. fire in a lab, and conducted in a half day session with all teams participating. Future drills will be progressively more complex as the teams become more familiar with their roles. In addition, software recovery/restoration procedures will be tested at a hot site to validate the integrity of procedures in real time on equipment.
Three separate drills at three locations were conducted successfully with no major disconnects identified. Participants, perhaps for the first time, appreciated the combined effort of all the teams and were more aware of the sense of urgency and teamwork required should a real disaster occur.
Plan Updates & Maintenance
A biannual review of critical project priorities will be conducted to maintain the accuracy and validity of Bellcore’s triage list
In addition, an annual review assesses new projects, removes terminated projects, and adjusts for changing business directions. Each recovery/restoration team is responsible to review and update its respective plan(s) on a semiannual basis to, at a minimum, maintain current key contact and equipment lists.
Plans are maintained in a corporate central repository as well as at respective locations. Team leaders and alternates also maintain recovery / restoration plans on and offsite.
Key learnings as discussed above can be summarized as follows:
1. Once again high level buy-in facilitates plan acceptance and implementation particularly when a disaster is viewed as hypothetical and distant. Be especially sensitive so as not to disrupt current work efforts when building/testing plans.
2. Collaboration between corporate and technical/project personnel builds cohesiveness and ensures that all viewpoints are considered.
3. The planner as business continuity planning expert and facilitator drives efficient meetings while maintaining team ownership and momentum. The use of PC software tools based on proven templates and an LCD projector facilitates real time plan development and accelerates the learning curve.
4. Criteria for project recovery prioritization should be linked to corporate priorities and values.
5. Empower team members and leaders. Recognize that knowledge of day - to - day operations will minimize downtime and quicken recovery/restoration. Use executives to support rather than lead teams.
6. Basic business skills are the foundation to the successful integration of technical/corporate disaster planning.
7. A great deal of required documentation is available within the organization and need not be re-created for the business continuity plan.
George Stratis is director - strategic market planning with Bellcore. Norman Snow is manager- business continuity planning with Bellcore.