Disaster planning is acknowledged to be essential for corporate survival. But, unless disaster plans are thoroughly tested periodically, they can actually lull companies into a state of inadequate semi-preparedness.
Fortunately, the Mead Corporation realized the importance of hotsite testing their data recovery process before it was too late. In preparing for their first hotsite test, Al Tokarsky, Senior Systems Programmer, realized that with Mead’s existing data recovery system, at least three days would be required to restore business critical applications—such as Electronic Data Interchange (EDI), spreadsheet applications, financial analysis packages, and an internal communications application—in the event of a disaster.
“We had been using the same approach to data recovery for a number of years,” he says. “But as we started getting more concerned with disaster recovery, I looked more closely at how our backup and restore product worked. It soon became evident that we’d be in real trouble if we had to rely on that product in an actual outage.”
Tokarsky’s first step in remedying this situation was to identify recovery standards. “My original goal,” he recalls, “was to fully recover the entire VM system in a hot-site test in under five hours.”
That original target has been slashed as a result of a recent hot-site test when Mead finished a complete base restoration of critical business data in just two hours and thirty-five minutes. With the knowledge gained through the hot-site test, however, Tokarsky now believes the recovery period can be cut even further.
“We’re still in the process of streamlining recovery procedures,” he says. “We found that by running a stand-alone module of our restoration system directly we can simplify the environment so that we won’t have to depend on any other tape management products in the recovery process.”
The base tapes used in recovery operations are created weekly and shipped offsite along with a listing of all the tapes that would be required for recovery, including NSS (named saved system) tapes, and the key restoration system tapes.
“Every week we take a complete base backup — essentially a snapshot of all of our data as it exists at the time,” Tokarsky explains. “This physical, cylinder for cylinder representation of the DASD can be restored faster than incremental backups because it is not dependent on the CMS file structure, and verification of each file is not required.”
In addition, Mead makes two incremental backups daily, sending the first copy offsite for secure storage, and keeping the second copy onsite for ad hoc file restores. The daily incremental tapes are cumulative, and include all data changed since the previous base backup was made. Each incremental backup typically incorporates 5000-6000 user IDs, while the full base generally has over 7700.
The problem with Mead’s previous backup and restore system was that when the base backups were made the data had to be compressed — and decompressed — before the backup tapes could be used in a recovery operation.
“This presented a Catch 22 situation,” Tokarsky says, “where we had to have a base system up in order to decompress the files, but we needed those files in order to get the base system up. If our hot-site was down for any reason and we were forced to migrate to a cold site, restoration would be virtually impossible.”
Mead’s new system, called SYBACK, solves this problem because it can operate as a stand-alone module and does not require uncompressed files. “All we do,” Tokarsky says, “is enter the hotsite, verify the tapes are there, and load the key tape containing the two catalogue files and the stand-alone module. Since the Vol Sers needed to run the job are all in these files, no full file catalogue product is needed. The system then uses one file as input for the base restore, and with all DASD virtually attached, the job proceeds automatically. All we do is mount tapes as prompted by the system. Less than three hours later we’re done.”
Once the base is fully restored, Mead then restores the incrementals, a process which in their most recent hotsite test took just one hour and 40 minutes. But again, Tokarsky stresses, with additional hot-site tests, that number is expected to be reduced to as little as one hour.
“Hot-site tests not only demonstrate that our disaster plans work, but also provide us the opportunity to improve the process and trim valuable minutes,” he says. “When dealing with business critical applications, absolutely minimizing restore time is critical because every minute cut from the restoration process can translate into thousands of dollars saved.”
Ira Goodman is Software Services Manager at Syncsort, Inc., Woodcliff Lake, New Jersey, developers of SYBACK, Mead’s data backup and restoration system.
This article adapted from Vol. 4, No. 3, p. 21.