Over the years, I have been asked many questions regarding the best option or recommended option to use for a company’s recovery site. As we are all aware, selecting the best site for a recovery center is difficult. Budgets cannot always afford the most secure ones, and our anxieties cannot afford the least expensive. What is a Disaster Recovery Administrator to do when he or she is asked the all-important question, “Which site is best for us?”
This article cannot aspire to make that selection or decision an easy one, but it highlights questions that I have often been asked over the years. Hopefully, it will provide helpful answers or insight to some of the more commonly asked questions. Even if it prompts additional questions, it should provide some foundation for making a more informed decision or recommendation to management. The format will be to pose a question and then provide an interpretive answer.
WHAT TYPES OF REMOTE SITES ARE THERE TO CHOOSE FROM?
This question is answered in the order of MOST to LEAST expensive. Please note that the more costly option is USUALLY better equipped to handle recovery strategies within the shortest time frame.
- Second Center: The company owns and operates its own second data center with the purpose of the two backing each other up in a crisis.
- Hot-sites: These are “ready-to-go” environments. They are usually pre-equipped with chilled water, air-conditioning, primary and back-up power, raised flooring, physical security protection, compatible hardware, and some telecommunication capabilities. Additionally, hot-sites provide a cold-site environment to migrate to within an allotted time frame from the hot-site part of the facility.
- Cold-sites: These are the same as hot-sites, but normally without the hardware and minimal telecommunications capabilities.
- Warm-sites: These are also the same as hot-sites, but the company uses the hot-site for the recovery of a part of its critical processing while using the cold-site for recovery of other lesser critical processing. Usually when a company has equipment from two separate vendors (possibly as a result of a merger agreement or buy-out), it may use the hot-site for the equipment and applications considered most critical while setting the “other vendor” equipment up in the cold-site.
- Reciprocal: The company elects to enter into a formal or informal agreement with another company for backup services.
WHAT SHOULD BE MY FIRST STEP IN ANALYZING WHICH TYPE OF RECOVERY SITE TO USE?
The first step is to perform a thorough Business Risk Analysis. This will identify the following:
- Critical applications and other processing
- Critical hardware requirements
- Critical personnel and office space required in the event of relocation
- Quantification in dollars or other meaningful measurement for your industry of the potential losses to the company resulting from a disaster
- Critical recovery period or minimal recovery timing required
Armed with this information, you will be in a better position to determine the type of recovery site needed, its minimal configuration, and appropriate dollar expenditure for disaster recovery planning. The dollar expenditure amount will assist you in determining the percentage of those dollars to spend for recovery site coverage.
WHAT ARE THE DRAWBACKS IN USING A RECIPROCAL AGREEMENT?
If you rely on a reciprocal agreement, here are some pitfalls to watch for:
- Time allotted by the reciprocal site
The reciprocal site may run a different operating system and/or release level than yours. This may require recovery of your operating system which takes time away from recovering your applications. If the site does run the same operating system and release level, verify that the time allotted for your use is greater than the time needed to recover and run your scheduled critical applications (recovery timing). The additional time will allow you to perform post processing DASD management or backup processing.
- Proximity to your facility
If the reciprocating facilities are on the same power grid or geographically close, then power failures, flooding, and other events may adversely affect your ability to recover. Recently, Hurricane Hugo and the San Francisco earthquake have shown that such agreements can provide a false sense of security.
- Lack of a written agreement and penalties for non-performance
A written contract should be negotiated with penalties specified for the non-performance by either party. Without teeth, a reciprocal agreement is good only if you never suffer a disaster or a disaster strikes at the most convenient time to allow the reciprocating company to oblige.
- Telecommunications capabilities of the reciprocal site
Be sure that your telecommunications needs are addressed and are capable of being handled by the reciprocal site. Remember that installation of phone lines and data circuits generally require long lead times. Inability to foresee your telecommunications needs may severely impact your recovery effort and success.
- May not provide you with adequate test time
WHAT SHOULD WE LOOK FOR IN A HOT-SITE CONTRACT?
When negotiating a hot-site contract, be sure to note the following:
- The penalty clause for early contract cancellation
- The amount of “free” test time offered and whether you can stack test time to provide for a longer test period
- The hardware configuration. Particularly note the type and quantity of DASD devices as compared to your site requirements.
- If you have more than one data center or plan to acquire additional data centers, then review the contract for provisions allowing the addition of data centers for coverage.
- Any “bumping” clause that allows the vendor to bump you, even during a recovery effort, in favor of a larger client
- The fees schedule pertaining to additional hardware such as DASD or CPU upgrade
- Disaster declaration fee
In discussing some particulars regarding the above points, use of “free” test time is very beneficial. Some contracts will provide you, based on a class of service, more “free” test time for a larger contract. Test time is generally in segments of eight hours each. A large contract requiring a $20,000 monthly fee may net you with six eight-hour segments.
DASD requirements are very important. Hot-sites may provide you with ample single density disk drives as well as a few double density. Triple density or greater may require you to pay additional monthly fees. If the higher density drives are required, ask the vendor about trading in equivalent single and/or double density drives provided with the basic contract. This may assist you in cutting costs for additional high density drives.
Some companies may have more than one data center in different geographical regions. Each may benefit from one contract covering all three. Of course, this assumes that no more than one center at a time will suffer a disaster. At times, companies may foresee the acquisition of additional data centers that could also benefit from the configuration contracted. Ask the vendor about contract provisions for allowing additional centers to be added at a minimal cost adjustment to the existing contract. If you do not, you may be required to enter into separate contracts for each additional center.
OUR PROCESSING IS COMPLETELY BATCH ORIENTED. WHICH IS BEST FOR US?
Depending on how soon your critical applications must be recovered and how long the total critical applications processing cycle or schedule takes, the reciprocal agreement would be the least expensive option. The recovery timing may also allow you to enter into the next least expensive option of cold-site. Either may suit your needs the best. Be sure first to determine your critical recovery period (recovery timing) and minimal configuration needs before making your decision.
Recovery timing refers to the amount of time needed to recognize the event, notify and mobilize your personnel and backups, recover your systems, network, and applications, and process your critical applications within a required minimum time.
OUR PROCESSING IS TOTALLY ON-LINE REAL-TIME AND MUST BE RECOVERED WITHIN 48 HOURS. WHICH IS BEST FOR US?
In this environment, either the hot-site option or your own second data center is suggested. However, a hot-site may not always provide you with the recovery timing required.
Another option may require you to set up a DASD farm at a remote location (such as a second data center). In the DASD farm, you would update your databases and other on-line files. At the same time, you update your primary databases and files. Using a DASD farm will provide the most up-to-date information. Recovery following a disaster would then involve using the recovery hardware at the DASD farm location or transferring the files via your recovery site. The DASD farm option will require your telecommunications plan always to be current and frequently tested.
HOW CAN WE TEST OUR PLAN IF WE USE A COLD-SITE?
Testing a plan does not always have to include “bringing up” applications on other equipment. If you are in a cold-site arrangement, test your plan short of shipping the hardware to the cold-site. This will include testing your procedures for:
- initializing the command center
- notification and mobilization of personnel
- notification of vendors
- identifying, packing, and shipping off-site tapes and documentation
The exercise will keep the recovery personnel alert and will point out deficiencies in the procedures of the plan. Items such as phone numbers and other contact information will be tested and noted if incorrect. If you have a test environment CPU, use it as your recovery CPU to test the recovery of your critical applications. Though it may differ from the equipment delivered by the vendor following a disaster, it will give you valuable insight into the actual procedures required to recover your environment.
ARE THERE OPTIONS IN TESTING OTHER THAN SENDING PERSONNEL TO THE RECOVERY SITE?
Many vendors have optional arrangements you can make to allow for remote recovery. In such testing, you would make arrangements with the vendor for multiplexers, modems, terminals, and data circuits or dial-up circuits and to have the vendor’s staff perform the duties of data center operations personnel. The equipment would be installed at a local site of your choice, usually by the vendor. You would send your backup tapes to the recovery site and have your documentation and personnel report to the local site.
To test would require you to dial up the recovery site and begin your recovery with the vendor’s staff performing the required tape mounts and other operations-oriented duties.
Such remote testing saves hundreds to thousands of testing hours annually.
These are among of the most common concerns expressed to me by employees charged with analyzing remote site options. Hopefully, this information will be useful to those who share similar questions.
Donald R. Sticksel, Jr., was responsible for Disaster Recovery Planning at Enron Corp. in Houston. He has given lectures and seminars to professional organizations and speaks frequently for students at various Texas colleges and universities.
This article adapted from Vol. 3 No. 3, p. 19.Printed in Summer 1990