Much has been said about mainframe disaster recovery and business recovery, but what about your LAN? Do you have a plan? Do you need a plan? Well, in the next few pages, I hope to provide you with a simple LAN disaster recovery approach and plan.
First, one must realize that LANs are every bit as vulnerable as a mainframe. As a matter of fact they are more vulnerable. Is your mainframe protected with sprinklers, fire detectors, smoke detectors, and water detectors? Is your mainframe connected to a UPS? What about Halon? Do you have a security guard at the entrance to your mainframe data center?
I could ask many other questions, but I think you get the idea. The answers to all of these questions for your mainframe are probably yes. If, however, I replace the word MAINFRAME with the word LAN, are your answers any different? You bet they are.
Now that we have identified that the LAN is vulnerable, and you probably already knew that it was, let's look at the planning part of the process.
Each LAN should have someone responsible for it. Usually this person is called a LAN Administrator or LAN Coordinator. This person is responsible for making sure that the LAN is available to support the business needs of the LAN users.
In order to do this, there are a couple of variables that must be identified. The first one I will call the System Recovery Window. This is the time it takes to recover the LAN. There is also the Business Recovery Window. This is the amount of time the LAN can be unavailable before there is an impact on the business environment.
By using the following equation, you can determine if the recovery approach that you are using will be successful. The equation is:
DD + PD + IN + BTR + SSF = LRW - BRW = RWD
DD = Disaster declaration time (the time to identify the problem and determine whether to declare a disaster).
PD = Hot spare or parts delivery time (the time to activate a hot spare or have a part or parts delivered).
IN = Installation (the time it takes to install the part(s) or activate the hot spare).
BTR = Backup tape restore time (the time it takes to restore your LAN backup tapes or boot the hot spare).
SSF = Site specific factors time (the time it takes for any actions that are site specific).
LRW = System recovery window time (estimated time to recover the LAN).
BRW = Business recovery window time (estimated time the LAN can be idle prior to the business function being impacted).
RWD = Recovery window deviation time (positive or negative difference between the LAN and Business Recovery Windows).
If the deviation is negative, the recovery method that is being used would be considered acceptable for the business function of the LAN. In this case the LAN will be recovered prior to the business function being affected. On the other hand, if the deviation is positive, you better look at a different recovery approach.
The positive deviation says that your business function will be in trouble, if you do not speed up the process or reevaluate the Business Recovery Window and see if it is not too short for the business function. But don't, underline don't, change the BRW just to make the current recovery approach work.
If you have not selected an approach, you can use the above equation to test different approaches. Since the Business Recovery Window should stay constant under most approaches, changes to the items in the LAN Recovery Window should be varied based on different recovery scenarios. Once you find the approach that suits your requirements and satisfies your business needs, it is time to write the plan.
Before I get to the plan, there are a couple of other issues that need to be addressed. The first is anti-virus software. Anyone who is not using an anti-virus package is destined to catch a virus. A virus on a stand-alone PC is certainly a disaster to that PC, but a virus on a LAN can be catastrophic. It can take weeks to clean up the LAN and cost many hours of effort. I will assume that your LAN has a virus protector.
Make sure that every PC on the LAN, every printer, if attached to a print server with a floppy disk drive, and every LAN server has virus checker software installed. Further make sure that the software is kept current it would be a disaster to have a virus checker on your system and have a new virus pass right through it.
DISASTER RECOVERY PLAN
Standard server configuration: (Copy of the server profile)
Standard workstation configuration:
Directory Structure: (Copy of LAN system & security files)
Configuration information from the server:
LAN user information:
Workstation CONFIG.SYS and AUTOEXEC.BAT file:
DR TEAM MEMBERS:
Team Leader Name:
Team Member Name:
Help Desk Phone:
A step-by-step set of procedures for each team which has been determined to be required by your site. Procedures should be simple enough so that anyone with a knowledge of a LAN could accomplish them if the primary or alternate person is not available.
Plan Review Schedule:
The month when each portion of the plan is reviewed and who will review that portion.
A copy of a generic test plan which can be used for the testing of the various LANs.
A copy of the annual test schedule. Since LANs seem to have a tendency to have maintenance problems on the various parts of the LAN, you might want to use real outages as tests and document them in this section.
A copy of the results of each test or real outage.
The second issue is backup. If you don't need to backup your data or don't need to move the backup media offsite, you don't need a disaster recovery plan. But if you do backup your data, and you do have a business need for the data, you will need a plan. I will assume that you are backing up your data and storing it somewhere offsite.
Each LAN should have a backup plan which outlines when a backup is taken, where the backup media is sent, when it is sent, what the label on the media should look like, and anything else which might be required to actually do the backups. Standardized labelling and media will help the people who will be storing your backups and help you when the they are returned.
The label should contain at a minimum the following: the server name, the media number, the date the media went to storage, the scheduled return date, who the media is from with a phone number, the type of backup, the backup date, and any additional information that is deemed necessary.
This information should guarantee that the media will be readily available for recovery and will easily flow back and forth from your LAN to the offsite storage area and back.
Now that we have a virus free LAN and have our backup media located offsite, let's look at a recovery plan. The following plan while quite simple will work for most LANs and is not expensive, if put on your word processing system.
I will first discuss the parts of the plan and then show the format.
PLAN COMPLETED BY: This is the person who completed the plan and the date it was completed.
LAST REVIEW DATE: The date of the last review of the plan and the name of the person who did the review.
PURPOSE: A short description of why the plan was written, what LAN it supports, and any other information that might be needed. This is like an Executive Summary so that someone can pick up the plan and quickly get an overview of the plan. This is not an essential part of the plan and can be left out if it is deemed to be unnecessary for a plan.
INVENTORY: This could include many different types of inventories. For example: the standard server configuration, the standard workstation configuration, the directory structure, any other configuration data, list of IDs associated with the server, a copy of each workstation's CONFIG.SYS and AUTOEXEC.BAT files, and any other inventory type data that would assist in the recovery.
DR TEAM MEMBERS: A list of everyone who would be involved in the recovery of the LAN and its server with the name, home address, home phone, work phone and if applicable, a beeper or cellular phone numbers. Vendor names, addresses and phone numbers should also be included in this section. Manager or supervisor addresses and phone numbers can also be inserted in this section. Other useful phone numbers can also be put into this section.
PROCEDURES: This section lists the steps in the recovery process which must be accomplished for a successful recovery. It should have a space for checking off that the step has been accomplished including the name, date, and possibly the time the step was completed. This will guarantee that all of the steps are considered and completed.
MAINTENANCE: This section should set up maintenance procedures, such as, how often should the plan be reviewed and who is responsible for the review. If there is a corporate policy or corporate procedures which outline maintenance procedures, they could be referenced here.
TESTING: This section outlines what is to be tested, who is to do the test, when the test is completed, and what the results of the test were. A separate test plan or a generic test plan can also be a part of this section. Another name for this section could be EVALUATION, since some people feel that the word TESTING has a pass/fail connotation.
Some sections of this plan can be generically setup by a central location and the LAN Administrators can tailor it to the LAN or LANs that are their responsibility.
If you say this is a very simplistic approach, I would agree. However, it does engender all aspects of the planning process and the end result is a plan which has been easily constructed, satisfies the needs of the LANs business function, protects the LAN should a disaster occur, and is very cost effective.
This is a LAN disaster recovery approach which uses the KISS principle (Keep It Simple Stupid). If you have a LAN disaster recovery plan, you can keep it simple, but you are definitely not stupid. If you do not have a plan, you have certainly kept it simple, but ...
Bill Koch is an independent consultant in Raleigh, North Carolina.
This article adapted from Vol. #9#2.