Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

Continuous Availability - Reflection on Mirroring

Written by  Bradley R. Bruhahn, CBCP Wednesday, 21 November 2007 00:24

“I’m sorry, the computers are down!” How often have we heard this in our everyday lives? How many times can a company experience computer system downtime and not lose business? Just how far will brand loyalty maintain a customer base before they switch to the competition out of shear frustration?

There have been a number of major IT outages in the news in the past few years. According to The Gartner Group, businesses that can’t tolerate computer system outages should implement some form of ‘data replication’, or ‘mirroring’. For the “Global 2000”, Gartner indicates data replication is simply a business requirement.

E-commerce is driving more and more businesses to place increasing emphasis on continuous application availability and fault tolerant IT processing. For example, some financial institutions are building multiple IT sites with extensive failover capabilities so that an outage (even a complete site disaster) will not cause an interruption in service.

These companies see outages as a major threat to their businesses. In the banking and securities industry, regulators can impose harsh penalties for missed deadlines. The average cost of building and maintaining these hot standby sites can run from millions to hundreds of millions of dollars. However, with the risk of one multi-million dollar penalty (not to mention the loss of business) a company’s investment in hot standby systems rapidly becomes cost effective.

There are many different possible solutions to address remote copy needs. This article does not attempt to promote one solution over another. Each company must consider all of their unique IT business and availability requirements to make that determination.

Software-based mirroring solutions are usually extensions to or bolt-on management layers for applications. These can be dependent on specific operating systems and maintenance levels. Hardware-based solutions simply mirror the physical data no matter which operating system or application requests the service. These may not be able to provide application synchronization without significant design and planning work on the part of the client.

Geographically Dispersed Parallel
SYSPLEX (GDPS)



The latest IBM solution to the challenge of DASD remote copy is called Geographically Dispersed Parallel Sysplex (GDPS). According the Gartner Group, GDPS represents the most advanced form of system software to provide fault tolerant coverage. GDPS provides management of critical data mirrored between two physical sites, automates many of the ongoing operational tasks, and automates planned and unplanned outage scenarios. Currently, GDPS is an OS/390 based solution.

GDPS also helps address the phenomenon known as ‘The Rolling Disaster’ through its management of Consistency Groups. This occurs when remote copied DASD gets out of synch during the few milliseconds of an outage (such as an explosion within the computer room). If data is out of synch, by days if you are recovering from volume dumps, or hours if you are recovering from incremental backups, or milliseconds in the above example, it can be equally unusable.

EMC has provided a ‘Consistency Group’ facility using SRDF for a few years now. EMC now supports PPRC and XRC protocols to allow it to work in a GDPS environment.
Consistency Groups are not silver bullets without careful up-front design and planning to ensure all required data is physically placed in the right group, and stays there over time.

Heterogeneous Systems

Most every company has some application transaction that spans operating platforms. An application could receive data from the web, hand it off to legacy systems, and then generate work elsewhere. The ‘state’ of the work at the time of any failure is important. If a failure does occur, some, if not all of the systems are required to return the application to service. Most installations have mission critical data processing located across NT, UNIX, and OS/390 platforms.

Maintaining the Remote Copy Environment

The storage subsystem, by nature, is a very fluid environment. Storage managers must have the capability to quickly add volumes to pools that are running low on space and physically move volumes if performance bottlenecks are causing response time slowdowns. Tools such as Amdahl’s TDMF make this process much more likely in today’s environment.

DASD Remote Copy works via physical UCB’s within a controller, it is not aware of the application data that may reside on the volume. If a critical volume is moved from a UCB that is being remote copied to a UCB that is not, an obvious data integrity exposure exists. Most hardware vendors simply state that to avoid this exposure, we should mirror all UCB’s in the environment.

In practical implementations of remote copy, however, this is not always possible. Some DASD volumes may need to be excluded from the remote copy process to manage the recovery, especially with GDPS. Also, is it cost effective to remote copy SPARE volumes continuously?
In short, once the initial remote copy layout and design has been determined and implemented, a process must exist to easily keep it up to date. Any ongoing task or activity that causes data to be missed by the remote copy process places the entire recovery in jeopardy

 Storage Area Networking (SAN)

Storage Area Networking (SAN) represents a major challenge in the future for remote copy management. According to The Enterprise Storage Group “The extended distance capabilities of a SAN’s fibre channel aren’t a great help for disaster recovery or contingency planning. Extended-distance SANs will aid in high availability, but at this point, they do not allow users to copy data far enough geographically to be considered a rock-solid disaster recovery schema.” Three vendors currently working in this space are IBM (GeoRM), COMPAQ (Data Replication Manager), and Ark Research (in progress). The major hardware vendors are attacking the problem as a hardware issue to be solved in the controller.

What about Tape?

While the industry has made significant strides in addressing requirements with regards to DASD/Disk replication and system availability, critical application tape data must also be considered. Tape is still a critical component in most IT shops and needs to be accounted for in a true, all-encompassing, Continuous Availability strategy.

There are now several hardware and software vendor strategies and methodologies to address the need for tape availability.

Hardware Solutions

Some of the possible hardware-based tape copy solutions include:

1. IBM - Magstar Virtual Tape Server
2. Sutymn Scimitar/VTS & Scimitar/VTSE
3. StorageTek - Storage Virtual Storage Manager
4. IBM - Peer to Peer Virtual Tape Server

The above hardware-based implementations all have similar functionality but vary slightly in their implementation and requirements. In general, the Virtual Tape Server technologies emulate tape devices on DASD, buffering virtual tapes, and ultimately stacking the virtual volumes on real physical tape at a later time. VTS’s originally were used to gain economies of scale by virtualizing tape resources, reducing tape drive contention and utilization, increasing tape performance at DASD speeds, & more efficient use of tape media via tape stacking.

IBM Peer-to-Peer

Virtual Tape Server

Unlike traditional VTS solutions, the IBM Peer-to-Peer Virtual Tape Server is specifically designed to enhance VTS recoverability and availability. The IBM Peer-to-Peer Virtual Tape Server is the only solution at this time to utilize Remote Copy technology for tape.

his is accomplished in a similar manner to DASD Remote Copy. The VTS’s Immediate Copy mode is similar to Synchronous DASD Remote Copy in that copy to the second VTS completes before “Rewind Unload”. The VTS’s Deferred Copy mode models Asynchronous DASD Remote Copy and completes after receipt of “Rewind Unload”.

The IBM Peer-to-Peer VTS requires duplicate hardware at both sites. The Peer-to-Peer implementation couples two VTS’s together into one integrated solution that is accomplished via dual virtual volume copy with remote function and automatic recovery/switch capabilities.

Tape data sizes sent to the Peer-to-Peer VTS should be scrutinized. Most installations currently limit the size of the VTS data to about five or ten gigabytes. Anything larger could start to slow down the overall VTS performance. If you have a significant number of large, critical tape files, this may be a challenge. If you are relying on the VTS to mirror all of your tapes, how will data outside of the VTS be addressed?

Software Implementations

Some of the possible ‘software product’ tape copy solutions include:

1. CA - Vtape
2. EMC - CopyCross
3. Tape Mount Management (TMM)
4. Teracloud - Remote Tape Copy (RTC)
5. Aggregate Backup and Recovery Support (ABARS)

CA-Vtape

CA-Vtape is a software-based virtual tape solution that performs the tactical work of buffering, stacking and copying virtual volumes to physical tape and recycling.
This solution emulates 3490E tape devices and utilizes your existing DASD and tape hardware resources.

Because it is software and not hardware, it easily scales to meet your business needs with minimal hardware costs. If and when additional hardware is required, CA-Vtape is vendor independent and supports mainframe tape and disk. Vtape does, however, require processor resources. Generally this need is relatively low (approximately 2-4 MIPS).
Continuous Availability is provided for the data under Vtape control by ensuring it is physically mirrored using a DASD remote copy solution (PPRC, XRC, SRDF, etc) and the back end physical tape is supported via duplexing and/or export functions.

EMC - CopyCross

The EMC CopyCross solution transparently redirects tape allocation to disk. This process is similar to the hardware VTS implementations and CA-Vtape with the biggest difference being that CopyCross dynamically reallocates tape to DASD without the back end processing needed to ultimately move virtual tape volumes to physical tape. The entire tape library can stay on disk. Each installation will need to determine the capacity needed to support this.

EMC CrossCopy is also a proprietary implementation that only supports the EMC Symmetrix DASD line (so, in all fairness, this is a mix of a software/hardware solution). CopyCross dynamically redirects allocations to its virtual devices according to user-defined criteria and comes with a Wizard Planner tool to help identify redirection candidates.

Because its entire tape library resides on disk, you can leverage the mirroring capabilities of SRDF (Symmetrix Remote Data Facility) to achieve maximum data availability. However, any tape processes that are not redirected to DASD are not supported in this remote copy scenario.

Tape Mount Management (TMM)

Unlike the other software-based solutions, TMM is not a product, but rather a methodology that utilizes existing components of DFSMS ACS Routines to redirect tape allocation to DASD.
TMM requires tape analysis to identify potential candidates for redirection. The Volume Mount Analyzer (VMA) tool can assist with this task, or other products can also be used. A continuous allocation of people resources is usually required to implement and maintain TMM.

Because TMM is a static solution, it alone does not provide a long-term solution for continuous availability. A data center’s critical data mix continually changes over time and requires continuous re-analysis and implementation. TMM is a low dollar cost solution but requires a significant investment in people resources to meet the ongoing requirements of continuous availability. A window of exposure could exist during any time a critical data set is not being mirrored.

Also, critical data mirrored by TMM is only protected while it is in the DASD Buffer. Most TMM implementations require a migration or archive process to ultimately move the data to tape. Once this occurs, the data is no longer mirrored, unless a tape hardware mirroring solution is also utilized.

Step 1: (To right, top)
Step 2: (To right, bottom)


Teracloud - Remote Tape Copy (RTC)

The Teracloud Remote Tape Copy (RTC) product is a software solution that supports all tape hardware configurations, regardless of vendor.
RTC tracks critical tape data sets, and mirrors them to a remote facility, either at rewind-unload time, or in a real-time manner. The product provides a function to switch all primary site tape data logically to the secondary site volsers, and updates all catalog entries, in the event of a primary site disaster.


RTC also has similar functionality to the EMC CopyCross, in that it can intercept tape mounts and redirect them dynamically to DASD. One mode of RTC tape intercept occurs when a physical or virtual tape drive is not available. Rather than have the job ABEND with a 522, RTC redirects the data set to DASD, and moves it to tape later once a drive becomes free. Another ‘side benefit’ of this process is to stack data on tape in real-time mode, rather than initially creating data on tape, and mounting it later during stacking.

Where VTS solutions only support tape that is under its immediate control, RTC supports all tape I/O and can copy between unlike devices. RTC also has the ability to selectively copy tapes; down to the dataset level and dynamically alter tape allocations based on user-defined criteria. RTC may be a less costly solution than VTS remote copy of DASD buffers, Tape mirroring, and Peer-to-Peer implementations.

RTC is hardware independent and utilizes your existing tape infrastructure. This could eliminate additional hardware costs. However, if used in a Continuous Availability application, additional tape hardware is required at the second site with connectivity to the primary subsystems. RTC can be installed, implemented, and maintained with minimal effort and is user/application transparent.

Aggregate Backup and Recovery Support (ABARS)

Aggregate Backup and Recovery Support (ABARS) was initially designed to provide a synchronized, logical application Disaster Recovery process. While ABARS may still have a place in an overall DR plan, better solutions probably exist to address the needs of critical tape mirroring. With a proper amount of analysis of critical data and application synch points (using tools such as DR/VFI from 21stCentury or ABC from DTS Software), ABARS can be useful to provide batch application logical recovery.

However, if ABARS is considered for anything approaching a tape ‘mirroring’ solution, care must be taken to ensure that the time and tape drive resources are available to support the ABACKUP process. This solution would, at the minimum, require a remote electronic tape vault with robotics. Physical shipping of tapes would not provide the immediate data protection mirroring implies.

Conclusion

Admittedly, tape mirroring solutions, strategies and methodologies lag behind that of their DASD counterparts. Most installations either didn’t have critical data on tape, or missed it during the DR process. However, the realization that tape still plays a pivotal role in most production IT shops has been clearly identified and IT vendors are rapidly stepping up to deliver various solutions. New solutions to address data mirroring needs are continually being developed, refined and brought to market. The solution that’s ultimately right for you will probably depend on several different criteria, such as:

1. Is the solution vendor dependent?
2. Can your existing hardware resources be utilized?
3. Cost - hardware costs, software costs, implementation costs and ongoing management costs
4. Scalability - how easy is it to scale the solution to growing requirements?
5. Does the solution encompass support for all tape I/O?
6. Does the solution operate in real-time?
7. Is the solution transparent to the operating system? Do you need to make JCL changes, etc.?

Each company will need to carefully craft an overall availability solution that addresses their unique business needs. In all probability, to completely cover the multitudes of outage scenarios, this solution will require integrating several software and hardware products from various vendors.

Finally, any solution should not be viewed as simply a turnkey, one-time effort. Ongoing processes to ensure data availability is not compromised over time (such as allocation outside of a consistency group), automation and periodic testing of the solution are still key factors to the success of any DR or Continuous Availability plan.


Bradley R. Bruhahn, CBCP, is with Sandpiper International, a storage management consulting firm based in San Diego, CA. He assists clients with ABARS, remote vaulting, remote copy, GDPS and SAN implementations.

 

Login to post comments