But then there is Wife No. 2: tolerance for loss of data, i.e., recovery point objective (RPO). It has been observed that some operations, e.g., banking, has virtually zero tolerance for outage and data loss, which is certainly understandable. Transactions are often comprised of very large amounts of money, and mere milliseconds can mean a lot of it.
However, there are cases wherein the business isn’t concerned with not having a system up and running very quickly, because paper trails or other process elements enable productivity to be sustained. But the tolerance for data loss may be zero because of legal and/or regulatory reasons, e.g. pharmaceutical manufacturing, where losing clinical trial data can delay FDA approval of new products for months to years. Yet practitioners still find themselves torn between such seemingly disparate demands. They needn’t be.
Availability Strategy Choices
Here is one way of phrasing an IT service continuity policy: “If an application is worth developing and putting into production, it is worth recovering ... sooner or later.” Availability requirements can range from zero seconds to 10 o’clock next summer. When the RTO is zero, the architecture will (or should) be active-active or hot failover. That is, two instances of the system/application are running in parallel and, for regional threat/risk mitigation reasons, in locations separated by a suitable distance.
In between these two extremes, are RTOs that are somewhat “stepped” in nature, with respect to the varied capabilities of the architecture. From active-active, there is “warm back-up,” where the failover is manual to at least some extent, which can meet RTOs from 10 or 15 minutes to several hours. Then there is the “cold back-up, where the recovery server must be configured and restored, the application source code and back-up data loaded before putting it into production, which can meet RTOs in the 12 to 48-hour range.
Once the RTO reaches 48 to 72 hours or longer, manual restoration on cold servers, whether at another in-house data center or at a recovery vendor location (hotsite), this strategy becomes feasible, but also may add cost issues related to travel expense to support exercising the plans.
At the extreme, where there is high tolerance for outage and data loss, a “best effort” recovery plan will do. For the uninitiated, a “best effort” plan is one in which no equipment is purchased or subscribed from a recovery vendor prior to a disaster event. Rather, a system architect should document the required resources that must be securely stored offsite:
- Hardware configuration
- Manufacturer
- Model number
- Operating system, version level, services packs, etc.
- Number, type and speed of CPU’s
- Memory and attached disk space
- Licenses, keys, and any other items the application may require to activate the system, and
- Copies of
- Application source code,
- Network diagram, and
Some practitioners may object to writing a plan where there is no strategy to be implemented until a disaster strikes, but during Y2K preparations, it was realized that the above items were essential. Somebody recognized that back-up media alone may enable an application to run. But without the source code, the restored instance cannot be patched, fixed, or updated. Thus, the documentation and secured items listed above is the “plan implementation.”
Data Protection Strategy Choices
Data center operations and their operating costs, play a significant role in data protection strategy selection. Today’s technology offers far greater storage capacity per square foot of floor space than was imagined even 10 years ago, and data transmission speeds have vaulted at least highly. These factors weigh against labor costs for people who handle back-up media, most notably tape cartridges.
When the tolerance for data loss (RPO) is from zero to 24 hours, replicating to a remote site is the strategy, since having an offsite storage vendor make more than one pick-up per day at least doubles the labor costs and vendor fees to cut the achievable RPO to 12 hours. By jumping to replication, tape handling labor is zero. If back-up tapes are desired or required by regulatory mandate, the storage architecture can simply add a tape silo and terabytes of data can be automatically created and stored with only the handling labor for archiving the oldest required tapes in racks.
Once the RPO exceeds 24 hours, the strategy may shift to restoring from back-up tapes, since daily pick-ups for offsite storage is usually reliably controlled. From there, higher RPOs are a matter of how data center operations chooses to send tapes offsite.
And the Man?
The Man is the business: he finds himself ill-served by the competing interests of Wives with presumably different objectives. But need they behave thusly?
Not really. Computing platforms can be designed to meet availability requirements and storage architecture can likewise be crafted to protect the data as well as is required.
Does he really need to be plucked to baldness? Was he even asked?
It seems not, but then, is this any different than an IT department offering service “menus” with the (business) customers choosing one of three or more options (“No Substitutions, please”)? Rather, the ideal is a menu that offers two-column choices: “Let’s have the four-hour RTO and the 48-hour RPO, please.”
So, our Man decides to tell his Wives that he prefers the salt-and-pepper look, and they’ll just have to get used to it. The BIA establishes the down-side risk of being all black-haired or all white-haired, but he winds up with a chrome dome. And no one was happy, were they?
Gregg Jacobsen has an MBA in organization development and is a Certified Business Continuity Professional (CBCP) with more than 13 years experience in business operations and IT service continuity practice, both as a consultant and in-house practitioner. He is a BC/DR coordinator with Siemens IT Solutions and Services, Inc., serving their IT outsourcing clients. He is very active in the profession, including prior service as chapter president of the Los Angeles chapter of the Association of Contingency Planners and chair of the ACP Presidents’ Council, and is currently chair of the ACP Hall of Fame Judging Committee. He live and works in Westlake Village, Calif.




