Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

Critical Recovery Skills ‘Gap’ Analysis

Written by  JAN PERSSON, CDP, CBCP Wednesday, 07 November 2007 16:13
In the simple form, a recovery skills "gap" analysis matrix identifies all critical recovery skills in a manner that highlights clearly what skills are deemed critical and who supports the skill. A matrix is nothing more than a two-dimensional table. Down one side is the list of critical skills and across the top are the list pf personnel.

I’ve been in the DRP/BCP arena for more than 25 years. A frequent part of my consulting involves attending recovery center tests (i.e. hot sites, internal sites, BCPs, etc.) as a coordinator and a scribe. Based on 25 years and about four tests per year, I’ve seen a lot of tests. One of the things I’ve always been aware of is the number of critical recovery skills that are required to complete all of the recovery tasks on time. It’s not uncommon for the whole process to halt if a key recovery process fails and the expert isn’t around. Often, detailed recovery procedures are not written for others to follow. Frequently, the expertise resides with only one person. Sound familiar? 

The Problem

After 9/11 and Katrina we as an industry were compelled to redefine the "worst case" scenario, i. e. loss of physical facility, loss of computer support, and loss of key personnel. It’s not a pretty picture but it is a slice of reality. I listened to a person who had a New York office in the WTC towers. The grim story, many were lost. He commented that he wished they had better backups in place for key personnel. We all understand the importance of backing up the data and having several versions to rely on if needed. We’re pretty good at that. Backup of critical skills is also an important part of being able to recovery within the desired (or required) time frame. Based on observations during testing, we’re not as good at that. 

Solution: Critical Recovery Skills ‘Gap’ Analysis Matrix

So, what does this have to do with "matrixing?" Well, in the simple form, a recovery skills "gap" analysis matrix identifies all critical recovery skills in a manner that highlights clearly what skills are deemed critical and who supports the skill. A matrix is nothing more than a two-dimensional table. Down one side is the list of critical skills and across the top are the list pf personnel.

Read on and we’ll actually develop one in four easy, logical, structured steps. 

STEP 1 – Identify What The Critical Recovery Skills Are

The way I define them when I’m developing the matrix is to ask what skills (jobs, software, applications, infrastructure, etc.) presents a "showstopper" if it doesn’t work correctly at the correct time. That simply means the recovery effort stops until this capability or function is running correctly. These are the skills you want to make sure adequate backup is in place to reduce delays in recovery. We’ll talk more about "adequate" later in this article.

Now, a few examples of what might be called "showstoppers." What if:

1. The active directory is not functional? No one has access to anything. A "show stopper."

2. The DNS server is down? Not much moves forward.

3. The Exchange server is not functional? No one has e-mail. A "show stopper" for e-mail.

4. ROBOT (iSeries job scheduler) is not functional? No jobs get run. Or, everything gets run whether you want it or not! A "show stopper" for operations.

5. SAP R3 (ERP System) is not functional? That’s a production and distribution "show stopper."

6. Peoplesoft is not running? No paychecks are printed. Talk about a "show stopper!" 

Assumptions

Let’s define a few recovery skills/software products in order to get the process started. But first here are several short assumptions about the samples:

1. These are in no particular order or priority. They are only samples.

2. These represent a wide variety of skills across numerous platforms. Anyone reading this should find some skills to relate to and have a list of their particular skills to add.

3. Some organizations can place all functions on one matrix. For larger organizations, a separate matrix can be created for each functional area or department (network, data base, etc).

4. This article is focused on recovery skills. The process should however transfer to other areas as well. 

Samples of skills/products/applications include these: 

  • Applications: Peoplesoft, SAP R3, Share Point, ESS, JDEdwards, etc. 
  • Operations: Active Directory, Tivoli, BASIS, Robot, Magic, etc. 
  • Data Base: DB2, SAP, SQL, Oracle, etc. 
  • Network: LAN, WAN, VPN, Ethernet, T3, ATM, NIC, OC12, etc. 
  • Voice/e-mail: Exchange, e-mail, voice mail, PBX, VOIP, etc. 
  • Security: McAfee Virus S/W, Norton, password management, IDS, etc. 
  • O/S Software: UNIX, NT, Windows, VISTA, VM, LINUX, etc. 
  • Internet: IP Internet, Intranet, SMTP, POP, IMAP4, etc. 
  • Replication: SRDF, Double Take, Mimix, etc. 
  • Physical Facilities: UPS, HVAC, fire suppression systems, etc. 
  • And more … 

STEP 2 – Format The ‘Critical’ Analysis Matrix

I generally use the table feature in MS WORD to create the matrix. You could use EXCEL, ACCESS, or anything else as long as it is easy to read and the headings are clear. I tend to prefer tables because I feel they present a picture that is immediately easy to view and understand. And for me it’s a simple approach.

As an example, using a few of the above skills, the matrix would look as follows: 

SAMPLE – ‘Critical’ Analysis Matrix

Skill

Description

Impact

Critical

AD

Active Directory

Inadequate access for system users

Yes

eTraining

PC-based training software

None … train at a later date

No

SAP

Enterprise Requirements Planning

Halts production and shipping

Yes

Etc…

List all appropriate skills for the first level of evaluation.

 

 

 

 

NOTE: Those designated as "critical" (YES in column four) proceed to the next step. I’ve also included a sample of one skill that could most likely be delayed. 

STEP 3 – Fill In The Primary, Secondary, And Third Person

Once the critical skills have been identified and agreed they move to the next matrix. The task for this step is to identify who supports that critical skill. This is simply a matter of identifying who in the IT organization is responsible for the particular skill. Usually, the primary person is evident. The sample uses first names which are for easy presentation.

A part of this step also identifies what "other resources" might be available. In the sample Matrix, both vendors and consultants are listed. Obviously, their specific names and contact information would be filled in as a reference. You should also consider employees who have recently retired or moved to other areas within the company but were primary support people for a critical skill. Don’t overlook them as a possible resource in a disaster recovery situation.

SUPPORT KEY:

P = Primary support person or organization (aka Level 1)

B = Backup person or organization (aka Level 2)

T = Third person or organization (aka Level 3)

NOTE: Some organizations only go to two levels but more and more I’m developing three levels. Given the current pandemic planning strategy, it’s not a bad idea.

 

SAMPLE – Support Added

Critical

Skills

Andy B.

Grace C.

Ben F.

Laura A.

Other Resources

AD

P

 

B

 

Consultant Ben H.

CISCO

P

T

 

B

n/a

DNS

 

T

 

 

n/a

Exchange

 

 

P

B

Vendor

T1 (name)

MIMIX

B

P

 

T

n/a

ETC…

 

 

 

 

 

 
 

STEP 4 – Provide A ‘Skill Level’ For Each Entry

This step doesn’t have to be exact. It’s simply an indication of the capability level so that proper expectations are set concerning the recovery skills. If this isn’t done, it is probable that people assume since there is a back-up person that they can handle a recovery effort in the event that the primary person is not available. This is dangerous. Also, a part of this process should identify where holes exist and where training may be needed. 

Caution: This is not to be construed as a detailed personnel performance "rating." That is done through the HR department and is much more in depth. However, it usually isn’t a bad idea to discuss the process with them (HR) so there is no exposure in terms of how they view this process. Remember, the gap matrix is general in nature and is designed to identify where critical recovery skills/processes are not covered adequately. That is a part of BCP and a part of business survival. The reality is that it isn’t "adequate" to have critical recovery skills supported by only people who are learning. Keep that in mind when you view the next, and final, versions of the skills matrix.

SKILL LEVEL KEY:

1 = Fluent knowledge

2 = Working knowledge

3 = Just learning

Using the key above (or a key of your choice), review people’s skill level with the appropriate supervisor and fill in the level on the gap matrix. This is not an exact science (as I mentioned earlier). The goal is simple: Identify where critical recovery skills are not supported (as in adequate) to a level that would or could lengthen and/or delay a timely recovery. Now, take a close look at the following sample matrix. 

SAMPLE – Skill Level Added

Critical Recovery Skills ‘Gap’ Analysis Matrix

Critical

Skills

Andy B.

Grace C.

Ben F.

Laura A.

Other Resources

AD

P1

 

B3

 

n/a

CISCO

P1

T2

 

B2

n/a

DNS

 

T3

 

 

n/a

Exchange

 

 

P1

B2

Vendor

T1 – def

MIMIX

B3

P1

 

T3

n/a

Norton

P1

 

T3

B2

n/a

NT

P1

B2

 

 

Consultant

T3 – xyz

Oracle

 

P2

B2

T3

n/a

Peoplesoft

T3

 

B2

P1

n/a

ROBOT

 

P1

B1

T3

T1 – retired

Employee

SAP

 

 

P3

S3

n/a

SRDF

 

S2

P1

P3

n/a

UNIX

T3

 

P1

B2

n/a

UPS

 

P2

 

T1

Vendor

B2-abc

VOIP

P1

 

T2

B2

n/a

 

 

See Any Problems In The Matrix?

Now that you’re this far, see if you can use the matrix to detect a deficiency in the back-up levels of two skills. Take a look and see if you can spot them. Once you look it over and circle the two, return to this point for the answer. Did you find them? They are DNS and SAP. In the case of DNS, there is no primary or secondary support person identified. That leaves one person who is only learning to support the critical skill. Perhaps a few people left the organization? SAP is exposed since both primary and secondary support people are learning. In this case it would be good to discuss the support issue with the vendor or a consulting firm that has those skills. 

Benefits of Recovery Skills Matrixing

As we rely more and more on the availability of technology (computers and software) and the business indeed becomes a 24x7x365 concern via the Web and globalization, we should make sure our DRP/BCP plans address the human side in terms of critical skills. While some organizations are operating recovery solutions that utilize redundant equipment, high availability, and load balancing, most are not yet there. They simply must rely on the more traditional recovery solutions and their key people. In doing so there are definite benefits in completing an analysis as presented in this article: 

  1. Backup can be planned ahead of time vs. during a "stressful" recovery event. 
  2. Clear definitions and responsibilities tend to present things as they are in objective terms. They reduce the chance of unreal expectations. 
  3. Awareness training in general should be an active part of DRP/BCP. 
  4. Matrixing helps focus on issues that management needs to know. For example, if certain critical skills are not sufficiently backed up they need to be aware. Then they either participate in solving the deficiency or they choose to "manage the risk." 
  5. Critical recovery skills can be put into people’s objectives (i.e. responsibilities) measured, and adequate training can be provided.

     

Parting Thoughts

What I find particularly successful about this approach is that it condenses it down to a clear presentation that many words just don’t accomplish. We simply don’t have the time during recovery. The RTO has been shrinking and will continue to do so. Recovery times under 24 hours are indeed common. In fact, less than 8 hours are more common than one might imagine. The nature, frequency and severity of potential disasters continue to loom. Where and when will the next one be? No one really knows. In order to be prepared for a fast and efficient recovery we must have a good grasp on all the critical recovery resources including the people. That’s where we started this article and that’s where we’ll end.

I submit to you that this process of critical recovery skills – matrixing – is not difficult. It is very straight forward and logical. What it takes is commitment and a sense of duty to our industry. That’s our job as BCP professionals, making sure the recovery of critical business functions happens accurately and without delay. If this approach works for you, please use it.


Jan Persson, CDP, CBCP, has worked in the IT field since 1967. He began his formal disaster recovery involvement in 1980 by developing DR plans for numerous companies within a large conglomerate. In 1985 he started his own disaster recovery consulting practice, Persson Associates. He has written and/or audited more than 400 DRP/BCP plans, worked for and with the three major disaster recovery firms, conducts DR/BC seminars, test exercises and DR plan development workshops, and continues to take an active, hands-on role in DRP/BCP project activities in all size shops and environments.



"Appeared in DRJ's Fall 2007 Issue"
Login to post comments