So, you’re the Chief Financial Officer, of a medium-sized company and you just received some great news! Your chief information officer just told you that your company is well protected against a potential disaster because your data center has a reciprocal agreement. But that’s not even the best part! The really great news is that this agreement will cost you nothing! And now you can avoid spending all that budgeted disaster recovery plan expense money! This is too good to be true!
If, for just a moment, your instincts tell you this deal might just be too good to be true, it probably is! In disaster recovery the most insightful and prophetic admonition is, “you get what you pay for!”
No business worth saving can be recovered from any of the multitude of disasters that can befall it without some thought, analysis, planning, testing and (yes!) some cost!
There is no free lunch in the disaster recovery business if you are conscientious and sincere about protecting your business in the event of a disaster. Even if you only wanted an agreement to satisfy your corporate auditor, think again, because auditors are no longer accepting paper-only plans or handshake agreements. There is a new breed of auditor out there; brighter, tougher, and wiser. Steeled by the recent rash of natural as well as human-made catastrophes, auditors and consultants are very cynical about plans to “use someone else’s data center” should yours go belly up, and for good reason!
Let’s examine a few types of reciprocal disaster recovery agreements to understand why they might be worthless if you had to rely on them to save your business.
If your reciprocal agreement is with another company, a bank down the street or your brother-in-law’s accounting firm, you can count on it not working. These friendly agreements are usually made in good faith, but are nearly impossible to maintain over time as different businesses grow and evolve along different technological paths.
There is simply no accountability in these “gentlemen’s agreements” and no way to assure hardware compatibility or sufficient capacity. It is doubtful that this sort of disaster recovery plan is acceptable to anyone these days, even if testing can be demonstrated. So, if someone is trying to sell you this approach as a dirt cheap solution, don’t buy it!
In the case where both data centers are part of your enterprise, the backup scenario appears to be much more reliable, but it may only look that way.
Let’s check it out!
The first question one must pose is the issue of compatibility. Can any data processing application running in Site A be run in Site B with little or no alteration? Before accepting “yes!” as an answer, follow up with these questions:
• Are the hardware mainframes and peripherals compatible?
• Do my teleprocessing lines run to both sites?
• Is the systems software at both sites identical in every respect, including maintenance levels!
• Is my Job Control Language governed by standards which would preclude duplicate data set names and library names?
• Are the tape cartridge numbers in each site uniquely identified so that transporting them from Site A to Site B does not cause duplicate cartridge numbers in Site B’s tape library management system?
• Is there an ongoing process to review and evaluate these and other compatibility factors to assure conformity between centers, both now, and in the future?
This is not as easy as you thought, is it? Let me try to simplify the compatibility issue. Ask your CIO just one question: Do we periodically run all of our critical applications at the backup center in a production mode?
If the answer is yes, you have demonstrated compatibility and have a good level of assurance that recovery is possible (except for the capacity issue, which we’ll deal with later). If the answer is no, the reason given is probably either compatibility or the time required to restore the data. It might sound like this.
“Too much data,” says the CIO. “All integrated together with other files from other applications,” she adds. “It would take too long to move it all to Site B and test run it there. Too much work! Too risky!”
“Hmmm...” you ponder. “But if a regularly planned move generates all this work and risk, what happens in a real disaster?”
Now you’re catching on!
“Oh, you don’t understand,” your CIO counters. “We only run development work in Site B. We don’t have to be compatible. We just take everything with us, operating system, libraries, data, the whole works. And we just take over the system and dump down all our production stuff!”
“Right on top of our development data?” you ask insightfully.
“Of course not,” she responds. “We back up all that development stuff first, then we restore the critical production stuff!”
“Oh? Well how long does that take?”
“Well... a long time!” she finally answers.
And that is precisely the problem in backing up a production machine with a development machine. The development data must be unloaded, the machine and peripherals must be freed up entirely, and only then can the production restore process begin. All of this activity not only lengthens the duration of regular testing, but more importantly, delays recovery time in the event of a disaster. That time may be precious to the survival of your business!
Suppose the extra time required to recover is acceptable (be careful here, it may be acceptable today, but will it be two years from now?). You may want to explore the following post disaster scenario.
“OK,” you say. “I agree that development has a lower priority in a disaster. But what do my 75 highly-skilled, highly-paid application program developers do during this recovery?”
“Well, let’s see...” your CIO replies. “They help with the recovery. They’ll be very busy for a few days.”
“And after that?” you persist.
“Well after we get power back in Site A, or clean up the flood, or whatever,” she continues, “we do the whole thing in reverse. Move production back to the original site and restore development in the backup site and everything goes back to normal.”
“Just a minute! What happens if Site A is totally destroyed. By a fire, lets say, or by a tornado. There would be no place to come back to, right?”
“Right” she concedes.
“Then our 75 programmers could be sitting on their hands for weeks, maybe months! Isn’t that so?”
“We haven’t thought that part through quite yet, but we’re working on it,” she stammers.
“I mean how long would it take to rebuild a data center from scratch and where would you begin?”
“Well, with our multi-vendor environment and replacing some of that old iron no longer in new production... and we’d have to find a place close by and fit it up... and we’d need some contract help to handle this bubble workload while our people were running production... ”
“This could take a very long time!” you conclude. “And it’s expensive too! How much of this does our business interruption insurance cover?”
“It depends on if the disaster is an act of God or human-made... but in either event, nowhere near all of the cost.”
Let me see if I understand this, you reason, “I could save the business, BUT, it may cost me a fortune in idle development time and to rebuild my data center... and months of dislocation for our people.”
“Depending on the nature and seriousness of the disaster,” your CIO sighs, “Yes!”
You lean back in your chair and stare briefly at the ceiling, your hands clasped, as if in prayer, against your chin. You reflect, thoughtfully, on what you just heard.
After a brief moment, you lean forward and look straight into your CIO’s eyes. “Tell me again,” you ask, “why this in-house reciprocal agreement is such a good idea!”
Another more favorable scenario that may face the proponents of a reciprocal agreement is a multi-location production environment where applications from both sites can coexist and run intermixed. All you need to do is move some data and libraries to the backup site when needed.
The problem that this solution poses is one of capacity. Can the critical applications from the destroyed site fit with the applications in the backup site? If only critical applications can run in the backup site, will the users of the non-critical applications understand if bumped off by a disaster someplace else? Is there sufficient excess end user office space in each site to accommodate the users displaced from the site experiencing the disaster? Does the local telephone company have sufficient capacity to handle the influx of new data line requirements? On short notice?
Successful execution of this arrangement requires tremendous discipline in planning and preparation. It is not a trivial exercise! As the CFO, you may be asked to spend money to add peripherals (DASD, tape drives or printers) or telecommunication lines, to one site or the other, whose only purpose will be for backup and recovery. Can you afford to do this? You may be asked to approve expenditures, for CPU upgrades, to increase capacity for the sole reason of accommodating the workload of the other center in the event of a disaster. Is this a wise investment? You may be asked to find excess office space and teleprocessing lines to handle the contingency. Is this where you want to spend your money?
You may also be asked to increase your support and planning staff to manage this dynamic and complex planning responsibility.
Is this a cost effective way to assure business continuity? Probably not. And this is the good news! The fact that your people are asking for these resources means that they are paying attention to the requisite details and are astute enough to discern these requirements. The bad news is that they may never ask for any upgrades in either site for disaster contingency purposes. This is a telltale sign that your reciprocal agreement is not functional and most probably will not work. Without sufficient capacity or compatibility, a poorly engineered reciprocal agreement plan can easily turn one center’s disaster into a disaster at both centers with the speed of light.
Effective reciprocal agreements are complex, require constant attention, discipline, testing, and some level of investment. It is not impossible to design a process that works, it is just highly unlikely and is certainly not cost free. The telltale signs described here should help you determine if your cost free reciprocal agreement is fact or fiction.
And it would be shameful if the perception of a viable backup plan between your two sites is the only factor preventing you from considering a data center consolidation; where the real dollar savings are.
If you are absolutely sure your reciprocal agreement is sound and you are satisfied with the cost structure of your multiple data centers, you need do nothing. If you are not, you should consider another game plan.
A hot site subscription offers a testable, cost effective, and value-add alternative. It permits you to pursue any expense reduction yielded through consolidation, along with a very high probability of being able to recover your business in the event of a disaster, both today and in the future. It makes fiscal sense!
Step back and look the gift horse directly in the mouth. Be absolutely sure! Ask the tough questions. Challenge and probe! Hey, you’re the CFO! You need the facts! Then you can make an informed decision, take the appropriate actions to protect your business in the event of a disaster while pursuing other cost saving opportunities. And, by the way, you can be sure the next time an article starts off, “So, you’re the CFO of a medium-sized company,” it will still be of interest to you!
John E. Nevola is manager of the Business Recovery Services Center at Integrated Systems Solutions Corporation. He started his data processing career in 1965 as a network systems programmer with Bell Labs.
This article adapted from Vol. 5 #3.