
Online Ready Site Helps Ensure Survival of OLTP Applications
By Michael Katz
Most organizations today are so dependent on the operation of their computer facilities that loss of processing for any period of
time is intolerable, stated a Datapro Research report. This is especially true of enterprises running mission critical, online
transaction processing (OLTP) applications that provide a constantly current record of their businesses.
OLTP applications provide authorized users with the ability to immediately read, change, or delete information from any location in
a network. Examples of critical applications in finance include automated teller machines, electronic funds transfer, point-of-sale,
and securities trading. In retail, critical applications include online credit authorization, warehousing, and distribution. Critical
telecommunications applications include telemarketing call centers, 800 numbers, and emergency 911 services. Critical
manufacturing applications include work-in-progress tracking and just-in-time materials delivery. These are just a few examples of
how OLTP is increasing its role in a wide variety of industries.
The trend toward using online information to run an enterprise is rapidly spreading because it provides current data on which to
make management decisions, enables provision of better service to customers, and improves intra- and inter-company
communications, helping an enterprise gain and maintain a competitive advantage.
From a study of several industries (public utilities, finance/banking, insurance, manufacturing, and the services), the financial and
functional impacts of loss of computer service were reported by the Center for Research on Information Systems at the University
of Texas at Arlington, shown in Figure 1. In this study of 160 firms, the typical company can expect to lose almost 25% of its
average daily revenue by the sixth day of an outage. The estimated loss rises to 40% of the average daily revenue by the 25th day. In
another study by Datapro Research, of the enterprises that sustain a major disaster, 43% never reopen and 29% close within two
years.
In a disaster, lost data is one of the irrecoverable elements. For online applications, lost or corrupted data can eliminate chances of a
complete recovery. OLTP users face the greatest risk from a disaster because their critical business functions depend on
up-to-the-minute data. Thus, if computer service becomes unavailable, enterprises running mission-critical applications would be
unable to conduct their businesses. For this reason, along with pressures from legal departments, auditors, and sometimes
government regulations, these form the driving motivations for comprehensive business continuity planning.
Approaches to Business Continuity Planning
A plan for business continuity must describe the actions to be taken in the event of a serious disruption of normal business
activities. It should address criteria for execution of the plan, define responsibilities and authorities, and give guidance to those who
will be executing the plan. It must be a living document that is kept up to date as changes are made in the organization and the data
processing system.
There are many ways to approach the creation of a business continuity plan. A center-level approach requires the data processing
department write a plan to back up either all applications running on a system or all systems in the center. In the event of a disaster,
all applications are recovered at the same time whether or not they are critical, adding time and complexity to the process.
An application-level approach to planning often better meets the needs of enterprises running critical applications. Planners in
individual business functions determine what their critical business process and supporting application needs are, and then they
develop contingency plans for each of these applications. The benefits of this approach are 1) Non-critical applications do not
consume valuable recovery time, 2) Multiple applications can be recovered in parallel, and 3) The priorities of end users are
considered, putting them back to work faster.
Whichever planning approach is selected, more people are actively involved in creating the plan and there is a greater chance that the
plan will work and computer service will be available.
The Need for High Availability Systems
Before a computer outage occurs, an enterprise can protect applications supporting vital business functions by using a computer
architecture that provides high availability through hardware and software fault tolerance. Such an architecture requires multiple
processors with separate copies of the operating system, and pairs of essential components (such as disks, buses, and controllers).
If one component fails, the other takes over without loss of data or service. Likewise, in a system with software fault tolerance, if
one software component fails, the other takes over immediately to keep the application running.
In addition to high availability, such an architecture is also well-suited to OLTP because it provides the following:
* Data Integrity--System software ensures that a transaction is completed as a whole or not at all, even in the event of a power or
other system failure.
* Security--Database management is integrated into the operating system. This prevents subversion by a user who opens and writes
to files, going through the operating system and bypassing the database.
* Linear Performance Growth--Adding a processor provides almost a 100% performance improvement from each incremental
processor. This performance growth can continue almost indefinitely.
* Modular Expandability--More processors, disks, workstations, etc. can be added to the system without taking it down or
changing application or system code.
* Connectivity--The ability to connect to other vendors systems and networks protects an enterprises investment and improves
productivity of users and equipment
* Distributed Processing--One logical database is spread across any number of geographically remote systems. Users at local and
remote systems perceive the entire database as if it were stored locally.
* Price/Performance--Excellent price performance is gained by using a high-performance parallel-processing architecture coupled
with an operating system that is optimized for OLTP. Added to this is a database that is tailored for OLTP and closely matched to
the architecture.
A robust, parallel architecture can play an important role as part of a business continuity plan. With some systems there is a
requirement to bring the system down when new applications are hardware are added. With a hardware and software fault-tolerant
architecture, new applications and new hardware can be added without taking the system down and without the need to change
code. In addition, most system maintenance can be performed while the system is online.
Coupled with the use of an online ready site (a method of shadow vaulting), an enterprise can obtain close to continuous
availability for OLTP applications, even in the event of faults (computer, telecommunications, or human) or adverse environmental
conditions (power outages, fires, floods, etc.).
Disaster Recovery Solutions
To support a business continuity plan, an enterprise must select a recovery method. These methods include hot-sites, cold-sites,
mobile sites, service bureaus, reciprocal contingency agreements, and an online ready site.
A hot-site requires from 12 to 48 hours to take over service after a disaster. This includes the time spent retrieving the database
tapes from archival storage, transporting the tapes and DP staff to the hot-site, restoring the data to disk, and restarting the
application. Archived data used with a hot-site is out-of-date by the amount of time since the last magnetic tape copy of the
database was made and physically transported to the site. This lost data can represent one or more days of activity, depending on
the backup schedule. This level of protection is not adequate for critical OLTP applications.
A cold-site agreement provides a computer-ready room reserved for the subscribers system. It usually contains power distribution
systems, phone wiring, a raised floor, and temperature control. A minimum effort should be necessary to deliver and assemble the
computer system at the site, and arrangements for quick delivery should be made with a hardware vendor so that operations can be
restored before losses become unacceptable. Cold-site recovery can be time-consuming and costly in terms of lost business.
Mobile hot and cold-sites are a relatively new service. Computer-ready trailers can be set up in a subscribers parking lost and
linked by a trailer sleeve to create a space to suit the subscribers recovery needs. This minimizes the travel arrangements for DP
employees who may be reluctant to leave their homes and families after a disaster. The service allows a decentralized organization to
engage one vendor to service its entire organization.
Service bureaus provide immediate access to timesharing services at a cost that is usually less than other backup options. However,
service is usually available for short-term use only, and there is little database security. In any shared service agreement, the
promises made to other subscribers can interfere with an enterprises urgent needs, and service conditions and capabilities are
subject to change. In the event of a regional disaster, there is the potential that the supplier will not be able to provide the required
service within the necessary timeframe.
A reciprocal contingency agreement with another company with similar computer systems and applications is an inexpensive
alternative, but it can have many drawbacks. The agreements are not always enforceable. The site owner has first priority for its use,
and access time for testing may be difficult to obtain. Programming changes are usually required to run the recovering sites
applications on another equipment configuration. If the two sites are located in the same area, they both could be impacted by a
disaster.
An online ready site is a complete computing environment with computer systems, applications, telecommunications facilities, staff,
and a continuously updated copy of the database. It provides recovery within minutes of a disaster when the backup applications is
running instead of hours and days as with other types of sites. It allows the enterprise to maintain a current, online copy of a
database on a remote network node. The site can be located on a system next door for convenience, or across the nation to
minimize the effects of wide area or regional disasters. Because the system immediately updates the copy of the database after the
original database is updated, data loss from a disaster can be limited to as little as one second of processing.
Planning for Prevention
While no one likes to think about potential disasters and how they impact business and personnel, the best way to reduce the risk of
catastrophic data loss is by maintaining a business continuity plan. To support the plan, a disaster recovery solution that provides
cost-effective, time-efficient protection and a fault-tolerant system architecture can serve to prevent unavailability of critical business
applications. This support can assure enterprise management of the continuance of their business and give them the confidence to
increase their reliance on OLTP as a means to advance enterprise competitiveness.
Michael Katz is the Product Marketing Manager with Tandem Computers, Inc. in California. He is responsible for developing
Tandems corporate programs and strategies that support manageability, operability, security, and support. Prior to this position, he
was the Manager of Systems Software Product Management.
This article adapted from Vol. 4 No. 1, p. 20.
DR World Main Index | Return to DRJ's Homepage
Disaster Recovery Worldİ 1999, and Disaster Recovery Journalİ
1999, are copyrighted by Systems Support, Inc. All rights reserved. Reproduction
in whole or part is prohibited without the express written permission form
Systems Support, Inc.