Spring World 2018

Conference & Exhibit

Attend The #1 BC/DR Event!

Winter Journal

Volume 30, Issue 4

Full Contents Now Available!

Wednesday, 13 April 2016 05:00

Strengthening the Business Continuity Process with Methodical Drills

Written by  By Asif Khan, Farooq Khan, & Ahmad H. Al-Sharidah

The weakest link in the chain is the strengthof resilience for any organization. Business continuity drills are the key to detect, address, and strengthen that weakest link.

Why drills? Without a solid drill plan in place, the business continuity team can never provide the needed assurance that organizations’ critical services will be available at all times. With periodic drills you could ascertain how effective each component of the business continuity plan is and identify gaps needed to be addressed. With today’s growing system dependencies, it becomes an increasingly difficult task to verify the business continuity drills are effectively productive. That would mean business continuity drills are conducted methodically to touch each service, its dependences, and the gaps identified in these drills are not only addressed but also re-tested, in a drill, to determine their effectiveness.

A well-planned business continuity drill regime is the quantum share of your organization’s business continuity management (BCM) program. In fact, this regime furnishes the true reflection of the popular axiom “what gets measured, gets done.”

Professionals strive to provide a much-needed sense of reassurance to BCM program sponsors and proponents to gain their confidence. Nothing could be more prevailing than a well-managed calendar of business continuity drills, augmented by fact-finding reports when providing a thoughtful yet proven response to the executives’ concerns on the bottom line to maintain stakeholder confidence in the reliability of critical systems.

We will provide a framework to establish a solid business continuity drill regime which would not only provide a systematic checklist of each component in the BCM program but also gives you the ability to quickly adapt to changes as they come along.

What is a Business Continuity Drill?

In his book “Disaster Recovery Testing: Exercising Your Contingency Plan,” Philip Jan Rothstein notes, “The goal of testing and exercising your plan is not to find out if it works, but to determine how it does not.”

The business continuity drill is a simulation of an outage scenario for which an accepted level of resilience has been in place. The goal of this simulated scenario is to gauge the actual resilience of an organization compared to expectations. Think of these simulations as “what if” scenarios. A business continuity drill should never be deemed unsuccessful. It should always identify gaps and opportunity for improvement and optimization.

What Do You Need For a Drill?

The first step is to define the outage scenarios, or the failure of each platform/component within the environment. Once a business continuity scenario is identified, all of its dependencies must be charted out, such as system requirement, systems support personals, applications support staff, testing area, end-user testing, etc.

Khan1

Figure 1: Example of charting system requirements

Types of Business Continuity Drills

Business continuity has several kinds of drills, such as single component, entire service, table-top, and disaster simulation. Single component/platform is the most common and a good point to start. In this type of drill you choose a single component/platform to test all applications dependent on it such as network attached storages, database, or middleware systems. In a service drill, an entire service for a particular outage scenario is selected. These drills tend to be difficult to manage and require much more planning and support staff. However, the results are worth the effort as you clearly know the application dependency maps and test their atomicity while they are provided as one service.

Viable drill types include scheduled, surprise, plan review, tabletop, walk through, modular/component, and functional/line of business, simulation/mock, and comprehensive/full scale.

How To Plan For a Drill?

Planning for a drill is half the job. Execution is the other half. When it comes to drill-planning only three things are important: plan, plan, and plan. A calendar of business continuity drills is a very handy tool. One may put all the drills on the calendar with tentative dates before the start of the year and share it with all stakeholders. This way you are giving a reasonable heads-up about the workload and your expectation of each stakeholder.

Once you decide to conduct a drill you need to give a heads-up to the support team and all stakeholders at least two weeks ahead of time. The importance must be given to see if there are any changes in the pipelines that may require re-testing the scenario. If that is the case, it is advised to postpone the drill until the system changes are made.

MARCH

S

S

M

T

W

T

F

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

    
       
 

Scenario 1

 

Scenario 2

 

Scenario 3

Sample of a drill calendar

Testing plans for each organization would arguably be different from one organization to another. However, following a standard methodology would keep the efforts focused and align with the expectations. We recommend the following testing methodology:

  • The plans are tested to the fullest extent possible.
  • The costs are not prohibitive.
  • Service disruptions are minimal.
  • The results provide a high degree of assurance in recovery ability.
  • Evaluation provides quality input to plan review and updates.

Follow-Up and Lessons Learned

As stated earlier, every business continuity drill is a success as each drill would surely identify gaps or shortcomings which would ultimately inch resilience of your critical systems toward perfection. After each drill, a follow-up meeting with all stakeholders should be arranged where the objective would strictly never “point fingers” but learn from issues and ensure proper solutions. A proper drill report with tangible manageable action items, if there are any, are to be shared with all stakeholders. These drill reports would become an important document trail when your company is audited for compliance certification or by internal auditors.

  • CONTINUOUS DRILL CYCLE

In the Department of Energy’s “The Public Enquiry into the Piper Alpah Disaster,” W.D. Cullen reports, “The policy and procedures were in place: the practice was deficient.” A continuous drill cycle is a continuous loop for the improvement where the drill plan followed by a drill execution followed by feedback which in turn followed by the addressing the gaps/issues and going back to drill plan

  • MATURITY LEVEL

As you progress in the continuous drill cycle the system would slowly but surely attain the maturity level which would mean that fewer gaps would be identified during each business continuity drill, and the critical services covered are well aligned with the scope and exaptation of the organization’s BCM program. This is a commendable milestone. However, this is just the beginning of a new phase (un-announced drill).

  • UN-ANNOUNCED DRILL

Once you have conducted all your drills in the calendar few times, it’s time for the real test (the un-announced drill). You can think of un-announced drill as the maturity test of your end-to-end business continuity setup where the entire process of calling the support staff to testing site to the end-user testing would reveal your actual maturity. No matter what the results are, it is important to understand for the entire organization (including executives) that the un-announced drill is a continuous journey not a destination.

Data center disasters may be caused by nature, equipment failure, or human factors. All of these factors must be considered in disaster planning. One must have adequate recovery plans, but it is only a plan if it is never tested.

Testing your recovery plan, unannounced, would help your organization emulate a real outage scenario and identify problem areas so they could be corrected and prepared for a real disaster. Recovery plans are complex, therefore it’s critical that thorough preparations are done before conducting a drill or simulation test for all of the critical components.

Information sessions/workshops with application and testing support staff should be planned to explain the purpose, execution, and measurement details of the un-announced drills.

A tabletop (mock) disaster recovery drill would provide an organization’s support staff a practical checklist of procedures to follow during a recovery phase. The business continuity team should go over each critical procedures and planning document needed to ensure each step is covered in restoring critical services at the recovery site

The unannounced drill is a method of practicing for the actual scenario which would generally be activated via an organization-wide announcement, followed by a grace period (if available) to shut down gracefully. Operations then would be alerted to call-in the on-call back-up support staff.

First and foremost, priority is production operations, which would be secured and guaranteed to be operational during the entire drill period. If any interruptions are observed the drill will be aborted at once.

A well-thought and flawless “abort” plan is a must which should be ready to be exercised -- just in case if there are any unforeseen issues

  • PROCESS FLOW: UN-ANNOUNCED DRILL

An email message would be sent to all concerned stakeholders of the drill announcing the brief interruption of services during the failover of the services to recovery site. A grace period of 15 to 30 minutes would be given to secure the in-process application data and communicate the brief interruption to the end-users. The datacenter operations would be alerted with the un-announced drill who would contact to bring on call/back-up support on-site.

All support/application/users staff participating would be asked to report any problems encountered during this recovery drill. For example, reporting to worksite, communication difficulties, surprise technical issues, application issues, performance problems, etc.

One may follow similar communication and prepare in advance.

  • DOCUMENT CONTROL

Once a business continuity drill is executed and a proper follow-up has been conducted, an official report of the concluding the activity, its finding, and recommendations are essential factors to

  • to keep the management engaged and seek the support and attention from the support organizations.
  • to ensure the drill goals, results, and its execution are fully documented for establising a baseline and audit purpose.

Conclusion

Based on the way organizations operate, it is imperative that business continuity drill strategies are embedded in the operational routine. For this approach, many of the critical success factors focus on building and utilizing support within the organization.

  • Build the confidence of users/customers by periodic drills/testing to validate that the business continuity plans remain effective and organization has the proven capability to sustain continuity of its critical operations in the event of an incident.
  • Communicate the upcoming business continuity drills and its execution procedures with support and end-users’ entities. Publish the annual business continuity drill calendar and share the progress, revisions, and reminders with all stakeholders on a quarterly basis.
  • Collaborate representation of support and end-users to ensure active participation during the execution of business continuity drills and testing the validity of the business continuous services.
  • Foremost priority is the production operation, and every effort should be made to ensure continued production operations during the business continuity drills as per expectations.
  • Capitalize on the lessons learned and adhere to the Plan-Do-Check-Act cycle.
  • Alignments of drill objectives to organization goals, management acknowledgement on reports, test results, follow-up on lesson learned, and recognition.
Kahn-Asif

Asif Kha is disaster recovery professional since 2004, with 20 years of multidisciplanary career in information technology with Fortune 50 corporations. He has a master’s degress in engineering from Wayne State University, is a graduate of the technical managers program from Georgia Tech, and several professional certifications.

Farooq

Farooq Khan is business continuity subject matter expert with strong knowledge of business continuity best practices and protocols; operational risk management; an in-depth knowledge of international BCM standards promoted by BCI, DRII, ISO; and experience to use them across upstream datacenters in supporting critical computing and processing operations and daily practices.

Al-Sharidah-Ahmad-H

Ahmad H. Al-Sharida is an IT system analyst and a member of ISACA and SPE organizations. He has a master’s degree in computer science from the George Washington University and holds several technical certifications in multiple platforms. In addition, he achieved a computer security and information assurance certification. Currently, he is leading the business continuity process in his IT organization.