
Developing a Contingency Plan for Your
Tele-Processing Networks
By Jim Chettle
Part 1
The contingency planning information in this article is limited in scope to the tele-processing (TP) network and does not address the
requirement for full restoration of the data center. The TP network discussed here consists of everything outside the main frame.
The network includes the front-end communications processors, the telecommunications lines, modems, multiplexers, and the
remote user devices.
Fundamentals
Each of your TP networks is different in some way from the others, and therefore plans can best be formulated after a careful
examination of all the components in each network. This implies that a thorough inventory be undertaken.
Following a disaster, all your TP networks will not have the same level of importance in becoming operational again. You must
obtain the recovery priorities from the users of the network. The users opinions of the critical nature of each network and the
resultant cost to recover must be approved by a management level high enough to warrant continuing your planning efforts and
expenditures. More simply put, verify the critical nature of each network before expending a lot of effort and money.
Portions of your network may not be critical now to the continuing business functions of your company. However, these portions
should be inventoried with the others, included in the plan, and identified as non-critical. The reason for including these is covered
later in this article.
Other portions of your network may be identified as extremely critical, and therefore may require special treatment in your recovery
plans. You may want to address recovery of extremely critical services prior to completing your entire plan.
Contingency plans are critical to the survival of your company following a disaster. This implies that they should be frequently
reviewed and updated. It is wise to put the plans together in a modular fashion to facilitate updating. A modular structure allows the
updating task to be distributed to several individuals or groups.
If contingency plans are critical to your companys survival, they should be tested frequently. Testing is the only way you will know
if your contingency plan works. Your hardware, software, networks, and applications continually change and your plans must stay
up-to-date with your environment. Testing will be an ongoing task to ensure that your plan is still current.
After you complete your contingency plans, you may want to share them with your vendors of communication equipment and
facilities. The vendors can verify and critique your plan and be in a more informed position to react quicker to your recovery plan
activation request, if they already know what is required of them. They may accept standing orders to activate their portion of the
plan upon a call from your network staff.
Most data center recovery plans will contain thorough lists of vendor contacts, employee home telephone numbers, etc. I
recommend that the TP recovery plan contain a customized contact list, which can be used to quickly organize the communications
recovery team. The list can contain the vendors phone numbers used for reporting trouble, as well as home numbers of key vendor
contacts.
Keep a copy of your recovery plan, inventory list, and contact lists at an off-site location. Most companies store magnetic media
off-site, and the complete recovery plans could be kept with the media. I recommend that key individuals also keep a copy at home.
Additional copies can be stored at your designated hotsite or coldsite.
The task of gathering information can be completed in several ways, but your completed work should be an auditable document
containing every component of the TP network. This is important, since you never know which part of the network you will be
recovering. You may be asked to recover a large user site, all sites in a geographical area, or the entire data center communications.
The inventory listing can be organized in several ways, and you must determine which works best for your situation. One method is
to set up a master list of all of your organizations networks. Identify each by the name used in everyday discussions, such as; teller
terminal network, sales support network, plant multipoint network, etc. Another method is a list by facility type or common carrier
service names. This would allow you to address the recovery of types such as analog multipoints or high speed digital pipes in
separate sections of your plan.
Make sure that you list all networks and devices using host access. The reason to list and classify all TP networks even if they are
classified non-critical is because they are currently being used by someone, otherwise they would not be part of the network. The
critical nature of any component could change at any time, and without a complete inventory list, recovery would be difficult.
Some sources for compiling your inventory list are: host/front end processor line listings; invoices from vendors of communication
facilities, modems, CRTs, etc.; service charge detail listings from the telephone company; maintenance contracts; and trouble call
phone lists. It is possible that your existing network support documentation contains all of the information necessary to complete
the list. My experience has shown that this documentation is as good a place as any to start.
Classification
Whichever inventory listing method you chose, it should comfortably fit into a scheme which allows the users to identify those
portions which they consider critical to their business plans. The users should classify the items on the listings as:
extremely criticalmust be recovered in 1 day; very criticalmust be recovered within 3 days; criticalmust be recovered within 5
days; etc.....down to non-criticaldo not recover.
When you ask the users to classify these networks, dont be surprised if they cant respond quickly. Sessions with the users can be
very time consuming but it is essential that the user is the one who classifies the networks.
The users are ultimately paying for the recovery scheme and they may be directly charged for their portion of the recovery costs.
You should explain to them how you are going to recover, what you can recover, an estimated cost of the recovery, and the
recovery scenario. The scenario should be a brief description of the recovery method you will use for most networks. Two
sample scenarios are: sample 1 all processing will be done at a hotsite located out of town; all user sites will use dial-up access
to the hotsite location; dial-up access will double the normal response delay; users will call the hotsite only when they can batch
the work; sample 2 processing will be done at ABC Co. after 5:00 pm; in-town users will gather all input and drive to ABC Co.
for key entry; out-of-town users will dial-in via XYZ packet network to ABC Co.
Recovery schemes which will reduce performance or response time should be discussed with the users. For example, if your
recovery scheme reduces line speed, response time would suffer. Although this may be acceptable to data entry locations which
could re-allocate some of their work to off-shift times, it may not be acceptable to locations which have customer service functions
such as teller windows. Customer service users may require the same response level as normal and this can create a more expensive
recovery scheme than for the data entry locations. If the user does not agree with your recovery scheme, negotiations will be
necessary since various TP networks may be more vital to the users than you surmised. Upper management will make the final
decision based upon the facts presented and their assessment of various business functions.
Approvals by Management:
Cost vs. Risk
Very early in the classification process, upper management should be involved to provide direction. They should be the driving
force behind all contingency planning and should express their opinion in the area of cost vs. risk. If you could recover the entire
company network for $XX,XXX or could recover just the
critical networks for a lot less, you can be assured that management will choose the lot less. Even though the lot less may be
your general guideline, I recommend that you go through the exercise to cost out the complete recovery, since it may not be too far
out of line with a partial recovery, depending on the size of your network.
When you present management with your recommended plan and the alternative plans, try to include your best estimate of
risk-of-failure with each plan. For example, the dial backup scheme may be occasionally problematic due to the lack of inter-city
lines in a regional disaster. A completely redundant
facility linked to a remote hotsite could provide instant availability following a disaster, and have a high likelihood of being
available even after a regional disaster. The risk factor in the recovery plan is sometimes difficult to determine. Your vendors or a
consultant may be able to help you determine this factor.
Putting the Plan Together
Lets assume that you have completed the inventory classification, set up a skeleton recovery document, received approvals from
management, and are ready to write a formal plan. Through past experience I know that you will discover updating and rewriting
your plan is a never ending process. User business requirements change, networks change, main frame applications change, and the
critical classification of networks change. The structure of your recovery plan should be modular so it can accommodate changes
easily. A modular plan could contain sections such as:
1. Statement of overall intent of the recovery plan
2. Inventory of the communications network components
a. Multipoint plant and sales office lines
b. Teller network
3. Host site inventory
a. Front end processor
b. Multiplexing
4. Contact lists
a. Vendors
b. Network support personnel
c. User sites
5. Step-by-step recovery
a. Extremely critical networks
b. Very critical networks
c. Critical networks
6. Software
a. Line listings for front-end processor
b. Host command lists
7. General support information
a. Modem/multiplexer manuals
b. Dial backup numbers at remote sites
c. Access numbers for switched/packet networks
Customize each module in the plan based upon the expertise available to update it. For example, if a Software Support Group
handles the front-end processor GENs, they can handle the software section of the plan. Similarly, a Network
Operations/Control Group could handle the contact lists and the general support information sections.
The Step-by-step Recovery Plan
The Step-by-step portion of the plan is the most difficult module to write. This part can only be proven by testing. The writing
will require a great deal of thought because things you take for granted as common knowledge may be unknown to the person who
ultimately executes the plan. The best guideline I can give you in this endeavor is to bore them to death with detail. One of the
steps should be the ACTIVATION SEQUENCE which contains at a minimum the following items:
1. List of vendors and key individuals who must be notified, the method of notification, and the follow-up/verification agreements.
2. List of the actions you expect each vendor and individual to perform. This could contain the standing orders mentioned earlier,
but must be very detailed, including every step each vendor or individual involved must perform. (An example of this is listed
below.)
3. Location and phone number of the recovery command center for vendors and individuals to report status/completion.
4. Employee travel and expense arrangements, and location where each function will be performed. This might be a hotsite, a
coldsite, or another out of town facility which would require manpower. Some plans have this section in the overall master plan, but
it is wise to repeat it in the communications plan, even if it is abbreviated.
5. The activation priority list for each network type.
Detailed example of step 2 in activation sequence
Step 2 above is very critical to the success of your plan. The details in this section should include every task to be performed. If for
example your plan depends upon dial backup from a user site to a remote hotsite, then your instructions should contain:
- the hotsite dial-up (modem) telephone number and the alternate number
- the hotsite contact telephone number
- the user location information; names, phone numbers, home phone numbers of supervisors (include the modem dial-up numbers)
- a general description of the type of processing performed by the user. This could be: 3278 CRTs accessing CICS DDA
- the type of modem at the user site, speed, terminal type and protocol
- the communications company for both the phone line at the user site and the long distance carrier, with phone numbers for
problem reporting
- the procedure for establishing a connection, who calls who, on which phone line, switches at which location operated in what
order, etc.
- the timing of connections, such as 24 hours or 8:00 am - 5:00 pm
- name and phone numbers of support organization(s) to call if the equipment doesnt work at the user site
- this is very important....give the users a copy of this plan and keep it updated. The users will need the phone numbers for problem
reporting, etc.
If your plan required your communications vendor to switch your computer site leased lines to a remote location which has standby
lines and a switching arrangement, then your instructions should contain:
- overview of the switching scenario, including names of the services as the vendors know them (diagrams might help)
- the telephone number of the communications vendor for activating the plan. An alternate number, preferably of a supervisor or
manager level person at the facility responsible for controlling the switching
- the passwords or other information required by the vendor to authorize the connections
- steps required by your operators at both the user end and the remote computer location which are different from the normal
operation
This type of detail plan should be shared with your communications vendor. They can verify your instructions and keep a copy for
their own information.
Another section of the plan should contain the PHYSICAL SETUPS at your recovery site. This must be in enough detail to allow
a technician to establish your network from your plan write-up, without prior knowledge of your operation. An sample write-up is:
a. Brief overview of each type of line in the inventory, describing the communications controller line type, modem type and speed,
communications line type, and user site device types. The vendor names for each of the above should be included with reference to
the section in the plan which covers the operating instructions. This overview should allow an individual to understand the whats,
wheres, and whys of each type of network, so that a smooth activation can take place, even if some components have changed
since the last update.
b. Repeat the activation priority list in this section
c. Software specifics, for communications control processors
d. Connections to be made either by cabling, patching, or matrix switch, with each component mentioned by name/type/vendor, etc.
e. Location of each component (maps and diagrams may help)
f. Vendor support information in case of problems with components. This problem assistance information should be verified
regularly.
Summary, part 1
As you probably have guessed by now, the details can be endless. It has been suggested that the only way to verify that the
instructions contain sufficient detail, is to test them with individuals who havent previously read the plan. I tend to agree with this
idea, and suggest that you try it, after....you have tested your plan thoroughly with your best qualified support personnel. The tests
by qualified personnel proves that your techniques will work. The other tests will surface any missing details.
Information given in the previous sections should enable you to construct a skeleton communications recovery plan. Its up to you
to fill in the details.
Once your general structure is in place, any missing items will glare at you when you test the plan. Testing... is the most important
part of your plan. Without regular testing, you cannot honestly tell your company management that you have a viable recovery plan.
Without regular testing, you will not ferret out omissions or changes in your environment. Test the plan!
PART II - DISASTER RECOVERY METHODS FOR TELEPROCESSING NETWORKS
As discussed in Part I, you can begin the formulation of your recovery plan after you have completed your network component
inventory. Methods to recover your particular network are numerous, but are certainly influenced by the following listed items.
Before discussing recovery methods, Ill discuss these items, and offer a few general observations.
ITEMS WHICH INFLUENCE RECOVERY METHODS
- Type of Network
- Time Requirement for Recovery
- Percent of Network to be Recovered
- Budget Considerations
OBSERVATIONS
The cost of recovering networks varies greatly by the network type, by the time required for recovery, and by the percent of your
network you must recover. You can spend a lot of time dreaming up new ways to recover your network, but the result probably will
resemble other schemes supported by your hot site, cold site, and/or communications and equipment vendors. Ask them for
recommendations for your network recovery scheme. You can customize as required. These vendors may be critical to the success
of your plan, and will do a better job supporting a plan which they helped develop. In the end, you and your management must
decide which scheme best fits the goals of your overall contingency plan.
Interest in a contingency plan was probably generated by managements assessment of the criticality of the data processing function
to the financial health of your company. If the teleprocessing network is judged to be a critical component, then the cost to recover
becomes less of an issue than the time required to recover. Cost of course must be balanced against risk, and that is managements
job. Give them the facts and options, and they will select the method best suited to their goals. For example, if a particular network
is extremely critical, and must be recovered immediately, fully redundant lines and equipment may be required. Short of full
redundancy, recovery time and cost can vary considerably.
If the plan is the first for your company, the expenditures that you propose will be eyeballed for their financial worth. As discussed
in the previous article, the management approval process will come down to dollars vs. risk. I suggest that you give management
several proposals from which to choose. Include performance expectations as well as cost, and paint a scenario in plain English
explaining how each will work. You may want to include testimonial information from vendors or other companies which use similar
schemes. Your recommendation should be the one which makes the best sense from an operational as well as cost standpoint.
Following are recovery methods commonly used. The descriptive information will be somewhat abbreviated, but hopefully in
enough detail to give you an understanding of the technique. Keep in mind that the methods shown are only one solution, not
necessarily the best solution for your network. The various communications and recovery vendors and consultants can offer
customized recovery options, and they may have a plan which is better/faster/cheaper than the example.
Please note that the example cover only dedicated networks, and do not include fully redundant schemes. They are organized by
network/line type. More complex methods will be discussed in a future article.
DEDICATED NETWORK RECOVERY METHODS TO BE DISCUSSED
- Voice Grade Point-to-Point Lines -- Figure A
- Voice Grade Multipoint Lines -- Figures B, C, & D
- Sub-Rate Digital Lines -- Figures C & E

RECOVERY METHODS
* Voice Grade Point-to-Point Lines (See Figure A)
The most common and inexpensive method of recovering a failed dedicated point-to-point line is dial backup via a switched
network. This scheme can also be used as your disaster recovery method, as long as the alternate computer recovery site is
equipped with matching modems and sufficient dial line facilities. Figure A depicts a four wire backup scheme which requires two
telephone lines at each end.
Some new modems offer single call dial backup, therefore saving the cost of a telephone line at each end. In the long term, this
savings in monthly telephone line rental, and the savings in long distance charges, may offset the higher cost of these modems.
Note that dial backup, while commonly used, had drawbacks. The switched networks usually have a higher bit error rate than
dedicated lines. They are not guaranteed for high speed data transmission, and are not guaranteed to be available. A regional disaster
could bring about network busy situations. I recommend that you keep access codes for several alternate long distance carriers as
part of your plan. This may improve your chances of completing dial backup calls.
* Voice Grade Multipoint Lines (See Figures B, C, & D)
Dial backup can also be used to recover multipoint lines. Dial backup for multipoint lines requires either special bridging
arrangements, as shown in Figure B, or alternate master stations. as shown in Figure C. Figure D depicts an alternate master at a
hot site using a switching arrangement at the Telco office. Recovering multipoint lines with dial backup can be costly, and may be
troublesome, depending on the quality of the switched network lines.
The bridging arrangement depicted in Figure B is most successful when the bridges used are of the new type which cancel noise on
bridge legs not transmitting data. This arrangement also works best with modems without sideband diagnostic channels. Each
remote site would be called, using two telephone lines, therefore requiring an enormous number of telephone lines to recover a large
network. This arrangement is used as a recovery scheme for day to day line failures in many networks, and is offered as a standard
product offering by major modem manufacturers. Several hot site vendors offer the bridging arrangement at nominal cost.
The alternate master scheme depicted in Figures C and D can only be used in disaster situations, but can lower the cost if dial
backup is not required at each remote site for day-to-day operations. A switching arrangement is required at the common carrier
central office. The alternate master leg of the data line should be terminated at a site some distance from the computer center. This
can be a cold site, a hot site, or a company in your town which allows you to terminate the alternate master lines on their premises.
Figure C shows the alternate master terminating at a hot site. Figure D shows the alternate master at a site in the same city as your
computer center, with back-to-back modems used to link the alternate master to the recovery site.
* Sub-Rate Digital Lines (See Figures C and E)
Sub-rate digital in this discussion refers to 2.4, 2.8, and 9.6 Kilobit services. Recovery for digital can be done in the same manner as
the analog networks, using dial backup. To use dial backup as depicted in Figure E requires a standby analog modem and a
switching arrangement at each site. This can get quite expensive, and is not practical for large networks. The alternate master scheme
shown in Figure C can be used for analog or digital.
Part II SUMMARY
The methods discussed are in use and can be applied with variations to many small to medium size networks. Larger networks may
require more sophisticated methods.
As mentioned previously, I recommend that you seek your vendors and consultants advice when laying out your teleprocessing
recovery plans.
This article adapted from Vol. 1 No. 2, p. 3; No. 3, p. 6.
DR World Main Index | Return to DRJ's Homepage
Disaster Recovery Worldİ 1999, and Disaster Recovery Journalİ
1999, are copyrighted by Systems Support, Inc. All rights reserved. Reproduction
in whole or part is prohibited without the express written permission form
Systems Support, Inc.