Data protection requirements have moved on from the purely technical question of "did the backup work?" to the much more complex question of "is my business truly protected?" The view of the backup application’s success or failure is no longer relevant unless considered in the context of business policies.
Advanced back-up reporting and analysis software is required to bridge the gap between the technical and business definitions of success. Backup reporting needs to be provided in a way to which the business can relate, reporting on applications and business units rather than servers, databases and file systems. Meeting this need is the emerging field of data protection management (DPM). Following are five criteria that need to be addressed so that businesses can feel sure that they are protected from interruption through data loss in the event of any type of disaster.
Backups must contain required data
It is possible for a backup to complete successfully but fail to contain the required data. Some data on the server being backed up may be unavailable, such as Microsoft Outlook files that are in use. Other data may not be present, such as feeds from upstream systems or references to file systems that no longer contain files after a prior data migration. The back-up software will protect whatever data it can and report success on completion, unaware of the required presence or state of the critical data.
DPM products need to look for anomalies in the back-up process. A backup that shows a significant drop in the amount of data backed up may indicate an issue regardless of the fact that the backup software reports success. A backup that succeeds but reports large numbers of unavailable files, or misses specific business-critical files, needs to be flagged for further investigation.
Backups must complete within window
Even if a backup runs and completes successfully, it may not have run at a suitable time. For example, a backup of a trading database during the business day will contain intra-day data that may be inconsistent and of minimal, if any, use. Further, the act of backing up will degrade performance on the server being backed up and can cause significant business issues. To ensure the backup is consistent and does not impact the business it must both start and end within the window.
It must be possible to generate back-up windows for each server, and those windows need to be flexible enough to take account of weekends and business holidays. Reports must be available for backups so users know if a backup was in window in addition to whether or not it was successful. Ideally, alerts should be generated when backups are in danger of going out of window so that work can be carried out to reschedule the backup and keep it within the required window.
Backups must be at the right level
Backups are often run at different levels on different days, with "full" backups containing all of the data required for a restore and "incremental" backups relying on data from the previous full and subsequent incrementals for a restore. Incremental backups are popular due to the decreased time taken to complete and lower storage requirements. However, there is a subsequent cost on the recovery side as the time and number of tapes required to carry out a full restore increases with each incremental, as does the risk of a bad tape preventing a complete restore.
A policy covering back-up levels needs to be put in place hat provides details of either how often a full backup should run or of the maximum number of incremental backups can between two full backups. A policy on the maximum number of tapes required for a restore or the length of time between full backups should be put in place and enforced through automated checks.
Backups must cover the entire application
Back-up systems work at the level of the filesystem or server rather than at the level of the application or business unit. An application that can be described in business terms as "the customer web portal," for example, may actually consist of multiple servers, databases, file systems, etc., that have no inherent relationship. Unless all of the pieces of each application have been backed up there is a risk that it cannot be restored if needed.
It is important to be able to display a consolidated application-level view of data protection. The restore point for the application is going to be further back in time than the last successful backup of any part of the application, but how much further back? If there is a site failure then from when will you be able to obtain a restore of the entire application? Equally important, how long will such a restore take? Due to the often manual nature of restores, it is hard to get a highly accurate answer to the latter question. But a good estimate is a very useful number to have.
Backups must be set to expire at the right time
Each backup that takes place has a built-in expiry date. Beyond this expiry date the details of the backup will be forgotten, and the data itself will often become unavailable. With the advent of legislation that requires data to be available for significant periods of time, commonly up to 10 years, expiry periods for data need to be set to retain the backed-up data for the appropriate length of time. Equally important, when the expiry time for the backup, has been reached the tape on which the backup resides should either be destroyed or recycled.
Businesses need to have a clear data expiration policy, based on both internal and external requirements and defined separately for different categories and types of data as required. Backups need to be classified against these categories and types, and checks must be made at the point of backup to ensure that expiry dates are set correctly. Checks also need to be made to ensure that any tapes containing expired backups are either destroyed or recycled.
To be truly prepared in the event of a disaster companies need to have full visibility into the success criteria of its data protection environment. A backup that is considered successful by the back-up application can no longer be said to be truly successful unless a number of extra criteria are met. The question of, "Is my business truly protected?" cannot be answered by back-up applications alone. Advanced data protection management software is required to bridge the gap between the technical and business definitions of success. Each business, and often each department within the business, may have different success criteria depending on the internal and external regulations to which they are party, and reporting needs to be flexible enough to allow for this. Finally, back-up reporting needs to be provided in a way to which the business can relate; reporting on applications and business units rather than servers, databases and file systems.
Jim McDonald is the chief technology officer and co-founder of WysDM Software, a provider of innovative data protection management solutions. McDonald spent five years prior at Goldman Sachs where he held international responsibilities in the systems management, scheduling, and primary and secondary storage areas. McDonald was also director at StorageNetworks and was the primary architect and systems administrator for one of the first public-access systems on the Internet while at the University of Edinburgh.
"Appeared in DRJ's Summer 2007 Issue"