In today’s processing environments, backup windows, requirements, processes, procedures, and equipment have drastically changed since the days of stand-alone mainframe environments. Now, large data centers may have mainframes attached to many UNIX servers, which in turn may be connected to a multitude of NT Servers. All of the NT Servers may possibly feed information back into a data warehouse system that resides on the mainframe. A single transaction coming into a data center probably will reside on multiple platforms at different points in time. This type of data center architecture presents a multitude of problems for backup and recovery processing. Storage management no longer has the luxury of a large backup window to complete a snapshot of all the data. Many, if not all shops are heading toward a 24x7 processing schedule. Added to all the confusion are the different backup methodologies that are used today. Mainframes are usually still backed up by the full volume dump method along with incremental backups occurring at specified intervals. But the distributed systems are mainly using the incremental forever method employed by the major storage backup vendors. There are distributed backup systems on the market today that do have full volume backup capability. But one of the major concerns with using full volume dumps in the distributed environment is lack of bandwidth on the networks. (This will probably be less of an issue in a SAN environment)
As an example, say that a shop has 100 Unix and NT servers to back up between the hours of 8 p.m. and 6 a.m. The distributed backups would not be scheduled to run the 100 backups concurrently. Therefore, in this example, the person in charge of the backups would probably schedule five backups starting on the hour as well as five on the half hour. This will give a back-up rate of 10 systems per hour and complete the 100 backups by the 6 a.m. deadline.
Remember, most of the client servers will be active during this back-up process. This is where the synchronicity issue needs to be considered. Now transactions may be coming into the systems while other servers are being backed up. There is the possibility that a specific transaction may be backed up on different platforms depending on back-up schedules. This can result in the transaction being captured multiple times. There is also the possibility that the transaction could move through the application systems and miss backups altogether.
As you can see, this type of staggered back-up scenario would present serious shortfalls, especially to financial institutions or other businesses that cannot afford to lose or duplicate any transactions. This is the problem of keeping the data in sync during back-up processing.
This is not just keeping data in sync between mainframe and distributed systems. There is just as high of a probability that the out-of-sync problems will occur between different distributed systems. This is true since most organizations perform distributed backups on multiple servers during a particular back-up window, such as the midnight to 6 a.m. timeframe. Some distributed systems may have data backed up at 1 a.m., while other servers have data backed up closer to the 6 a.m. timeframe. In this type of environment, there is no way to guarantee that all transactions will be captured somewhere in the backup process.
In 24x7 environments, the storage administrator must have a firm understanding of the flow of data. Where does the data come in from? What systems is it passed on to? What are the interdependencies between different systems? Where is the final resting point of the data transaction? All of these points need to be taken into consideration when evaluating a successful plan of backup and recovery of data.
The order of back-up schedules will probably need to be adjusted to keep data in sync. It may be worthwhile to consider quesing particular systems for backups, if possible. This can be a very complicated and tedious procedure to go through to verify the validly of backups. In a large data center with hundreds of servers, this may be nearly impossible to do.
Another solution to the synchronicity problem can be handled with new technology. Several new products available now are a great aid in producing synchronized backups. Both units have the ability to help resolve synchronicity problems in data backups. The standard approach to backups in this type of environment is to stop all applications for a short period of time, take a snapshot the data, and then start the application processes running again. Downtime of critical applications due to data backups can now be minutes instead of hours. After the data has been captured in a snapshot, the backups and the applications can be run concurrently. The data backups will collect data from the snapshot, not the live data that now is being updated by the applications.
This process will work for both mainframe and distributed systems. Multiple systems could be quesed, snapshots taken of the data and then the systems returned to active processing status. This is a way to shorten the backup window from hours to minutes, and more importantly, produce a backup that is in sync between multiple systems. All of this will become more critical within a disaster recovery event. All of the data can be recovered so that there are no problems with duplicate records as well as the assurance that every record is included.
This type of backup scenario will produce a good base copy of data to start from. Then incremental backups from the different systems may be applied as necessary to bring the data back to a specific point in time.
In conclusion, the issue of data synchronicity may or may not be a concern in your particular environment. We are no longer given the easy task of backing a single system in a large stand-alone window. We are now looking at multiple systems being backed up on varying schedules. The data center environments today present storage administrators many new challenges working with all the individual entities. Storage administrators should be aware of the problems that may exist when performing backups of multiple servers using multiple schedule times. If your environment is small enough, you may need to arrange the back-up schedules for better coverage. If that will not work, then it may be necessary to bring up the issue of moving your data to newer storage servers that provide the snapshot capability. Is your complete set of backup data in sync or not?

Jeffrey D. Blackmon, CBCP, is a senior systems consultant for Software Systems Consulting in San Diego. He has 18 years experience in the field of disaster recovery and business continuity for both mainframe and distributed systems. He can be reached at jeffb@ssccorp.com.




