|
DISASTER
RECOVERY
JOURNAL
Return
to the Spring 2001
Index
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
EDITOR
Michelle Saab
michelle@drj.com
COPY EDITORS
Edward H. Pearce, CBCP
drj@drj.com
Richard
Sandhofer
richards@drj.com
INTERNET /
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
ww.drms.com.br
|
|
Click
Here for a Printable Version
Continuous
Availability: A Reflection on Mirroring
by Bradley R. Bruhahn, CBCP
Im
sorry, the computers are down! How often have we heard this in
our everyday lives? How many times can a company experience computer
system downtime and not lose business? Just how far will brand loyalty
maintain a customer base before they switch to the competition out of
shear frustration?
There have been a number of major IT outages in the news in the past
few years. According to The Gartner Group, businesses that cant
tolerate computer system outages should implement some form of data
replication, or mirroring. For the Global 2000,
Gartner indicates data replication is simply a business requirement.
E-commerce is driving more and more businesses to place increasing emphasis
on continuous application availability and fault tolerant IT processing.
For example, some financial institutions are building multiple IT sites
with extensive failover capabilities so that an outage (even a complete
site disaster) will not cause an interruption in service.
These companies see outages as a major threat to their businesses. In
the banking and securities industry, regulators can impose harsh penalties
for missed deadlines. The average cost of building and maintaining these
hot standby sites can run from millions to hundreds of millions of dollars.
However, with the risk of one multi-million dollar penalty (not to mention
the loss of business) a companys investment in hot standby systems
rapidly becomes cost effective.
There are many different possible solutions to address remote copy needs.
This article does not attempt to promote one solution over another.
Each company must consider all of their unique IT business and availability
requirements to make that determination.
Software-based mirroring solutions are usually extensions to or bolt-on
management layers for applications. These can be dependent on specific
operating systems and maintenance levels. Hardware-based solutions simply
mirror the physical data no matter which operating system or application
requests the service. These may not be able to provide application synchronization
without significant design and planning work on the part of the client.
Geographically
Dispersed Parallel SYSPLEX (GDPS)

The latest IBM solution to the challenge of DASD remote copy is called
Geographically Dispersed Parallel Sysplex (GDPS). According the Gartner
Group, GDPS represents the most advanced form of system software to
provide fault tolerant coverage. GDPS provides management of critical
data mirrored between two physical sites, automates many of the ongoing
operational tasks, and automates planned and unplanned outage scenarios.
Currently, GDPS is an OS/390 based solution.
GDPS also helps address the phenomenon known as The Rolling Disaster
through its management of Consistency Groups. This occurs when remote
copied DASD gets out of synch during the few milliseconds of an outage
(such as an explosion within the computer room). If data is out of synch,
by days if you are recovering from volume dumps, or hours if you are
recovering from incremental backups, or milliseconds in the above example,
it can be equally unusable.
EMC has provided a Consistency Group facility using SRDF
for a few years now. EMC now supports PPRC and XRC protocols to allow
it to work in a GDPS environment.
Consistency Groups are not silver bullets without careful up-front design
and planning to ensure all required data is physically placed in the
right group, and stays there over time.
Heterogeneous
Systems
Most every company has some application transaction that spans operating
platforms. An application could receive data from the web, hand it off
to legacy systems, and then generate work elsewhere. The state
of the work at the time of any failure is important. If a failure does
occur, some, if not all of the systems are required to return the application
to service. Most installations have mission critical data processing
located across NT, UNIX, and OS/390 platforms.
Maintaining
the Remote Copy Environment
The storage subsystem, by nature, is a very fluid environment. Storage
managers must have the capability to quickly add volumes to pools that
are running low on space and physically move volumes if performance
bottlenecks are causing response time slowdowns. Tools such as Amdahls
TDMF make this process much more likely in todays environment.
DASD Remote Copy works via physical UCBs within a controller,
it is not aware of the application data that may reside on the volume.
If a critical volume is moved from a UCB that is being remote copied
to a UCB that is not, an obvious data integrity exposure exists. Most
hardware vendors simply state that to avoid this exposure, we should
mirror all UCBs in the environment.
In practical implementations of remote copy, however, this is not always
possible. Some DASD volumes may need to be excluded from the remote
copy process to manage the recovery, especially with GDPS. Also, is
it cost effective to remote copy SPARE volumes continuously?
In short, once the initial remote copy layout and design has been determined
and implemented, a process must exist to easily keep it up to date.
Any ongoing task or activity that causes data to be missed by the remote
copy process places the entire recovery in jeopardy
.
Storage
Area Networking (SAN)
Storage Area Networking (SAN) represents a major challenge in the future
for remote copy management. According to The Enterprise Storage Group
The extended distance capabilities of a SANs fibre channel
arent a great help for disaster recovery or contingency planning.
Extended-distance SANs will aid in high availability, but at this point,
they do not allow users to copy data far enough geographically to be
considered a rock-solid disaster recovery schema. Three vendors
currently working in this space are IBM (GeoRM), COMPAQ (Data Replication
Manager), and Ark Research (in progress). The major hardware vendors
are attacking the problem as a hardware issue to be solved in the controller.
What about
Tape?
While the industry has made significant strides in addressing requirements
with regards to DASD/Disk replication and system availability, critical
application tape data must also be considered. Tape is still a critical
component in most IT shops and needs to be accounted for in a true,
all-encompassing, Continuous Availability strategy.
There are now several hardware and software vendor strategies and methodologies
to address the need for tape availability.
Hardware
Solutions
Some of the possible hardware-based tape copy solutions include:
1. IBM - Magstar Virtual Tape Server
2. Sutymn Scimitar/VTS & Scimitar/VTSE
3. StorageTek - Storage Virtual Storage Manager
4. IBM - Peer to Peer Virtual Tape Server
The above hardware-based implementations all have similar functionality
but vary slightly in their implementation and requirements. In general,
the Virtual Tape Server technologies emulate tape devices on DASD, buffering
virtual tapes, and ultimately stacking the virtual volumes on real physical
tape at a later time. VTSs originally were used to gain economies
of scale by virtualizing tape resources, reducing tape drive contention
and utilization, increasing tape performance at DASD speeds, & more
efficient use of tape media via tape stacking.
IBM Peer-to-Peer
Virtual Tape Server
Unlike traditional VTS solutions, the IBM Peer-to-Peer Virtual Tape
Server is specifically designed to enhance VTS recoverability and availability.
The IBM Peer-to-Peer Virtual Tape Server is the only solution at this
time to utilize Remote Copy technology for tape.
This is accomplished in a similar manner to DASD Remote Copy. The VTSs
Immediate Copy mode is similar to Synchronous DASD Remote Copy in that
copy to the second VTS completes before Rewind Unload. The
VTSs Deferred Copy mode models Asynchronous DASD Remote Copy and
completes after receipt of Rewind Unload.
The IBM Peer-to-Peer VTS requires duplicate hardware at both sites.
The Peer-to-Peer implementation couples two VTSs together into
one integrated solution that is accomplished via dual virtual volume
copy with remote function and automatic recovery/switch capabilities.
Tape data sizes sent to the Peer-to-Peer VTS should be scrutinized.
Most installations currently limit the size of the VTS data to about
five or ten gigabytes. Anything larger could start to slow down the
overall VTS performance. If you have a significant number of large,
critical tape files, this may be a challenge. If you are relying on
the VTS to mirror all of your tapes, how will data outside of the VTS
be addressed?
Software
Implementations
Some of the possible software product tape copy solutions
include:
1. CA - Vtape
2. EMC - CopyCross
3. Tape Mount Management (TMM)
4. Teracloud - Remote Tape Copy (RTC)
5. Aggregate Backup and Recovery Support (ABARS)
CA-Vtape
CA-Vtape is a software-based virtual tape solution that performs the
tactical work of buffering, stacking and copying virtual volumes to
physical tape and recycling.
This solution emulates 3490E tape devices and utilizes your existing
DASD and tape hardware resources.
Because it is software and not hardware, it easily scales to meet your
business needs with minimal hardware costs. If and when additional hardware
is required, CA-Vtape is vendor independent and supports mainframe tape
and disk. Vtape does, however, require processor resources. Generally
this need is relatively low (approximately 2-4 MIPS).
Continuous Availability is provided for the data under Vtape control
by ensuring it is physically mirrored using a DASD remote copy solution
(PPRC, XRC, SRDF, etc) and the back end physical tape is supported via
duplexing and/or export functions.
EMC -
CopyCross
The EMC CopyCross solution transparently redirects tape allocation to
disk. This process is similar to the hardware VTS implementations and
CA-Vtape with the biggest difference being that CopyCross dynamically
reallocates tape to DASD without the back end processing needed to ultimately
move virtual tape volumes to physical tape. The entire tape library
can stay on disk. Each installation will need to determine the capacity
needed to support this.
EMC CrossCopy is also a proprietary implementation that only supports
the EMC Symmetrix DASD line (so, in all fairness, this is a mix of a
software/hardware solution). CopyCross dynamically redirects allocations
to its virtual devices according to user-defined criteria and comes
with a Wizard Planner tool to help identify redirection candidates.
Because its entire tape library resides on disk, you can leverage the
mirroring capabilities of SRDF (Symmetrix Remote Data Facility) to achieve
maximum data availability. However, any tape processes that are not
redirected to DASD are not supported in this remote copy scenario.
Tape Mount
Management (TMM)
Unlike the other software-based solutions, TMM is not a product, but
rather a methodology that utilizes existing components of DFSMS ACS
Routines to redirect tape allocation to DASD.
TMM requires tape analysis to identify potential candidates for redirection.
The Volume Mount Analyzer (VMA) tool can assist with this task, or other
products can also be used. A continuous allocation of people resources
is usually required to implement and maintain TMM.
Because TMM is a static solution, it alone does not provide a long-term
solution for continuous availability. A data centers critical
data mix continually changes over time and requires continuous re-analysis
and implementation. TMM is a low dollar cost solution but requires a
significant investment in people resources to meet the ongoing requirements
of continuous availability. A window of exposure could exist during
any time a critical data set is not being mirrored.
Also, critical data mirrored by TMM is only protected while it is in
the DASD Buffer. Most TMM implementations require a migration or archive
process to ultimately move the data to tape. Once this occurs, the data
is no longer mirrored, unless a tape hardware mirroring solution is
also utilized.
Step 1: (To right, top)
Step 2: (To right, bottom)


Teracloud
- Remote Tape Copy (RTC)
The Teracloud Remote Tape Copy (RTC) product is a software solution
that supports all tape hardware configurations, regardless of vendor.
RTC tracks critical tape data sets, and mirrors them to a remote facility,
either at rewind-unload time, or in a real-time manner. The product
provides a function to switch all primary site tape data logically to
the secondary site volsers, and updates all catalog entries, in the
event of a primary site disaster.
RTC also has similar functionality to the EMC CopyCross, in that it
can intercept tape mounts and redirect them dynamically to DASD. One
mode of RTC tape intercept occurs when a physical or virtual tape drive
is not available. Rather than have the job ABEND with a 522, RTC redirects
the data set to DASD, and moves it to tape later once a drive becomes
free. Another side benefit of this process is to stack data
on tape in real-time mode, rather than initially creating data on tape,
and mounting it later during stacking.
Where VTS solutions only support tape that is under its immediate control,
RTC supports all tape I/O and can copy between unlike devices. RTC also
has the ability to selectively copy tapes; down to the dataset level
and dynamically alter tape allocations based on user-defined criteria.
RTC may be a less costly solution than VTS remote copy of DASD buffers,
Tape mirroring, and Peer-to-Peer implementations.
RTC is hardware independent and utilizes your existing tape infrastructure.
This could eliminate additional hardware costs. However, if used in
a Continuous Availability application, additional tape hardware is required
at the second site with connectivity to the primary subsystems. RTC
can be installed, implemented, and maintained with minimal effort and
is user/application transparent.
Aggregate
Backup and Recovery Support (ABARS)
Aggregate Backup and Recovery Support (ABARS) was initially designed
to provide a synchronized, logical application Disaster Recovery process.
While ABARS may still have a place in an overall DR plan, better solutions
probably exist to address the needs of critical tape mirroring. With
a proper amount of analysis of critical data and application synch points
(using tools such as DR/VFI from 21stCentury or ABC from DTS Software),
ABARS can be useful to provide batch application logical recovery.
However, if ABARS is considered for anything approaching a tape mirroring
solution, care must be taken to ensure that the time and tape drive
resources are available to support the ABACKUP process. This solution
would, at the minimum, require a remote electronic tape vault with robotics.
Physical shipping of tapes would not provide the immediate data protection
mirroring implies.
Conclusion
Admittedly, tape mirroring solutions, strategies and methodologies lag
behind that of their DASD counterparts. Most installations either didnt
have critical data on tape, or missed it during the DR process. However,
the realization that tape still plays a pivotal role in most production
IT shops has been clearly identified and IT vendors are rapidly stepping
up to deliver various solutions. New solutions to address data mirroring
needs are continually being developed, refined and brought to market.
The solution thats ultimately right for you will probably depend
on several different criteria, such as:
1. Is the solution vendor dependent?
2. Can your existing hardware resources be utilized?
3. Cost - hardware costs, software costs, implementation costs and ongoing
management costs
4. Scalability - how easy is it to scale the solution to growing requirements?
5. Does the solution encompass support for all tape I/O?
6. Does the solution operate in real-time?
7. Is the solution transparent to the operating system? Do you need
to make JCL changes, etc.?
Each company will need to carefully craft an overall availability solution
that addresses their unique business needs. In all probability, to completely
cover the multitudes of outage scenarios, this solution will require
integrating several software and hardware products from various vendors.
Finally, any solution should not be viewed as simply a turnkey, one-time
effort. Ongoing processes to ensure data availability is not compromised
over time (such as allocation outside of a consistency group), automation
and periodic testing of the solution are still key factors to the success
of any DR or Continuous Availability plan.
Bradley R.
Bruhahn, CBCP, is with Sandpiper International, a storage management
consulting firm based in San Diego, CA. He assists clients with ABARS,
remote vaulting, remote copy, GDPS and SAN implementations.
©Copyright
2001 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
|