With this in mind, Dow recognized it needed to establish a recovery point objective (RPO) of zero, meaning no data loss, and a stringent recovery time of just four hours. The next step was finding a solution that would meet these requirements.
For this, Dow turned to Comdisco to design and implement an availability solution for its SAP environment. Dow had long used Comdisco for hot-site recovery solutions and had recently outsourced the management of its continuity program as well.
With a requirement for no or very little data loss and a very short recovery window, the team decided to implement a solution using E-Net’s Remote Recovery Data Facility (RRDF) to ensure the availability of Dow’s SAP applications.
The RRDF software enables recovery to point-of-failure for Dow’s mission critical SAP environment using real-time remote journaling and database shadowing to a technology service center.
Implementing the Solution
Once the decision was made to proceed, the next step was to implement the solution.
Dow’s enterprise SAP applications are deployed on two mainframes. One mainframe runs two DB2 databases to support its operations in North America and the Pacific region; the second mainframe runs two additional DB2 databases to support Dow’s operations in Europe and Latin America.
To help ensure against a regional disaster, the recovery environment for Dow’s SAP applications is located at an off site technology service center. An added benefit of the remote journaling solution is that it is insensitive to distance, allowing the continuity site to be thousands of miles away with little or no additional impact on the production applications.
The remote journaling software was installed on both Dow’s production processors as well as on dedicated processors at the off-site technology service center. As a transaction is processed against any of Dow’s DB2 databases at the production location, a duplicate copy of the database log and journal data is captured in real-time and transmitted to the center instantly over a relatively inexpensive network consisting of three dedicated T1 lines. At the center, the log and journal information is immediately saved to disk. Several times daily, the disks are archived to tape, allowing the disks to be reused while ensuring that the archived data is available should the entire database need to be recreated.
“Send” and “receive” regions running on the Dow mainframes are monitored remotely from the service location. The remote journaling software buffers, filters and compresses DB2 logstreams, thus fully utilizing the available bandwidth. Furthermore, fully automated spilling and gap recovery features enable speedy recovery from day-to-day link outages, “spikes” in the logging rate, or whatever software and hardware failures might occur.
In addition to live, real-time remote journaling, Dow produces daily backups, also known as “image copies”, of all four of the DB2 databases at its location and ships them off-site to their tape storage provider.
In the event of a disaster, the off-site tapes are shipped to the off-site facility. There, technicians would handle the initial restoration from those tapes and then use standard DB2 recovery software to do a “roll forward” of the databases, in effect capturing all the transactions that took place from the last tape backup to the point of failure.
The Solution is Put to the Test - Under Unexpected and Potentially Difficult Circumstances
Over the past year Dow has successfully tested the solution on several occasions. But it was during one test that Dow realized additional benefits of the remote journaling solution.
The operational procedure controlling the use of DB2 log data needed for remote site recovery had a minor problem. The Pacific region database had already been backed up on the same day (“Day Two”) as the simulated disaster point, so Dow needed log data from that point forward.
However, this backup tape was not available at the recovery site because it had not yet been ejected from the tape silo to be sent off-site. It is not unusual for backup tapes to stay at the production facility for a period of time, sometimes hours, before they are physically sent off-site. In this case, the simuated disaster point happened to fall at an inconvenient time.
The test situation illustrated a disaster recovery planner’s worst nightmare - sending the wrong backup tapes or receiving unusable backup tapes. “In a traditional recovery scenario, if you don’t have the right backup tapes to restore from, you can’t fully recover and you can’t synchronize your data,” said Worsley. “We needed to have all the databases reflect their state as of the simulated disaster point.”
Using the remote journaling solution, Dow was able to avert a test failure. As part of the implementation, a process had been established to archive Dow’s data on tape and hold these archives for several days - just in case. As a result, Dow was able to use the older (and available) database backup tape to do the initial restoration, then roll forward using two day’s worth of DB2 log data from the remote journals to ultimately reach the DR test’s simulated point of failure. All databases, including the one for the Pacific region, were recovered to a consistent point in time representing the simulated disaster point.
Dow realized during the recovery test that something was wrong, they had backups from “Day One” for all databases, but the log data needed to bring the Pacific region current to the disaster point was not provided. Tom Rechsteiner, Dow’s database administrator, managed the recovery process and recognized that there was a hole in the log stream, caused by the fact that the Pacific regions had been backed up early on “Day Two.”
To avert the potential loss in data caused by the gap in the Pacific region database and the disaster point, the testing team executed a special, unplanned, reformat process to obtain the log data needed for complete recovery of all the databases. Fortunately for Dow, RRDF provides options for extracting specific ranges of log data, enabling Dow to recover ALL their databases.
“In a disaster situation, unforeseen issues can have a significant impact on your ability to recover,” said Worsley. “Knowing that we can still successfully recover - even if we have the wrong tape or a tape that has gotten corrupted shipped to the facility - is very reassuring to us in making certain that we have a true high-availability solution.”
Dow’s experience shows how versatile and forgiving recoveries can be if companies have log data. Many companies back up databases on a staggered basis, and some use share-level change or ‘fuzzy’ backups. Using a log, all the databases can be recovered to a consistent point in time -the disaster point.
The use of “fuzzy” backups means that the databases don’t have to be quiesced or taken offline to make the full daily backup tapes. Too often, backups for contingency or disaster recovery require outages, compromising high availability. With the remote journaling solution in place, Dow achieves improved availability in its day-to-day operations as well as complete recovery at the disaster recovery site.
Marta Chevere is the Director of Advanced Recovery Services (ARS) for Comdisco’s Storage Services Group. She oversees the company’s development and integration of advanced recovery product offerings.
Chevere can be reached at email@example.com.