When you think of data protection, what do you think of? I’ll bet that you think of daily backup windows, restore SLAs, tuning back-up systems and off-site archiving. It is easy for those every-day tasks to dominate the data protection discussion. But maybe you should also think about data recovery – after all, when it comes right down to it, the only reason you back up all that data, night after night, week after week is so you can recover it!
Whether it is a file, a folder, or, in the event of an unforeseen event, your entire data center, successful recovery is the payoff for all that data protection investment.
Given the potential costs of loss of business continuity, every enterprise should apply the same attention to data recovery as it does to data protection.
Every CIO should ask these key questions:
Would our disaster recovery (DR) strategy enable us to recover completely from a disaster, such as an earthquake, hurricane, tornado, flood, fire and other large-scale natural event that destroys both the original data source and the local backup copy?
Is our data protected from human error and theft?
Is our remote office data as well protected as our main data center data?
Too often, the answer to these questions is “No.” Even enterprises with an effective DR strategy in their data center may have unknown gaps in their protection and a significant volume of data at risk in their remote offices.
CIOs should also ask:
Do I know the actual risk posed to my data?
Is our DR testing likely to reveal actual problems before an issue or disaster arises?
Again, too often, the answer is “No.” Although many companies have made improvements in recent years by including DR requirements for remote sites and moving more data to faster, more powerful disk-based backup technologies, a significant volume of data in most enterprises is still at serious risk.
Today, many organizations have already made improvements in their backup strategies that facilitate improved recovery assurance. They are migrating away from tape as the sole backup medium, and have begun using WAN and disk-based systems instead. They are taking advantage of disk-based systems replicating and deduplicating backup sets, and retaining many generations of data on reliable media that can restore data many times faster than tape. They are adopting new technologies that move beyond the limitations of tape and even virtual tape. When coupled with an enterprise-class data protection platform, these technologies enable enterprises to backup, deduplicate, and replicate data throughout the enterprise in a fast, cost-efficient way. They also make it simpler and more cost effective to perform more frequent, and more thorough DR testing.
This article will illustrate the ways that today’s data protection architectures can enable more reliable and rapid recoveries, while offering significant cost savings over traditional approaches.
Specifically, I’ll outline the seven things I think are the most critical considerations involved in establishing enterprise-class disaster recovery assurance.
1. Plan for enterprise-class performance and capacity requirements
The explosion of information, coupled with the need for longer data retention periods, has made data deduplication a critical requirement for enterprise-class data protection strategies today. Some data deduplication solutions can handle massive backup volumes and store tens of petabytes of data or more in a single system without threatening backup windows. Such solutions enable the enterprise to streamline and consolidate its data protection strategy onto fully automated, massively scalable systems.
A truly enterprise-class solution must also be enterprise-wide, including all data centers and remote offices, where an increasing amount of data is being stored, including on PCs and personal portable devices. Gartner estimates that as much as 60 percent of all enterprise data now resides at remote offices, and reports that 69 percent of the respondents in a recent survey are unsatisfied with their current remote office backup strategy.
An enterprise-class DR topology is hub-and-spoke where data is replicated from remote offices (spokes) via their WAN to their main data center (hub) and across a wide geography to a DR site to enable recovery from widespread disasters. Below this top level may be some number of intermediate nodes at the larger remote facilities, such as major divisions or regional offices. These nodes become spokes for the central hub(s), and also serve as hubs for some number of smaller remote offices, hence the hierarchy. More layers are possible, but the basic hierarchical topology remains the same.
The critical requirement of this topology is the ability to replicate massive volumes of data across the WAN fast enough to stay within short backup windows. Without a deduplication process integrated into the replication (see below), replication over a WAN would be far too slow, costly, and disruptive to business operations to be used in an enterprise. . Enterprise-class backup systems can backup, deduplicate, replicate and/or restore data concurrently while maintaining high performance. Without enterprise-class backup and deduplication in large, data-intensive organizations, replicating large data volumes over a WAN is too slow and costly to be practical.
Perhaps the best thing about the hub-spoke replication is that it enables the use of technologies that automate data protection, ensuring that defined policies for data protection—including backup, retention, replication and secure erasure—are adhered to without manual intervention. The result: less time wasted on repetitive manual tasks, adherence to company policies for data protection, retention, and destruction, and less risk of data loss.
2. Ensure you can meet recovery time objectives
Meeting your recovery time objective (RTO) and the recovery point objective (RPO), is becoming increasingly important, especially when financial transactions are involved.
The more aggressive the RTO and RPO, the more frequently an organization needs to perform backups. Both objectives can be difficult or impossible to meet with solutions that perform deduplication in a way that requires the more recent data to be rebuilt from deduplicated data before it can be restored.
Only the enterprise-class deduplication technologies also offer a capability that further accelerates restore times. They keep the most recent backup(s) constantly ready to restore without the need for any reassembly from its deduplicated form or other pre-processing. Without this so called “forward referencing” capability, the deduplication process begins to experience a gradual—and unacceptable—slowing in performance over time.
3. Monitor changing requirements
Enterprise data centers need to be sufficiently agile to adapt to changing business requirements, technologies and market conditions. Enterprises need a data protection platform that enables the IT department to analyze capacity growth and performance efficiency over time, and thereby, accurately predict future requirements. Today, enterprise data growth is growing at unbelievable rates –60 percent annually according to leading analysts. Analyze your current data backup requirements and then assess how your current data protection strategy will cope when the amount of data doubles, or triples – as it will, in a few years. Factor in the longer retention periods that you will need to manage for business policy and regulatory reasons. Deduplication will help you meet these requirements; but you will still need sufficient scalability to meet growing needs for the foreseeable future—preferably for the next five to seven years.
4. Understand how your data protection/disaster recovery platform will scale
Ideally the performance and capacity needed over time will also be available in a single modularly scalable system. This helps to control capital expenditures by enabling a pay-as-you-grow approach that allows for the addition of disk shelves for more capacity, or the addition of processing nodes for more performance. Scalable, enterprise-class systems can save significant costs by enabling enterprises to consolidate management and maintenance tasks in their main data center on a single automated system. They can also automate remote office backups, enabling IT managers to deliver the same high level of data protection throughout their organization. A scalable solution also eliminates the cost and complexity of installing and managing dozens of “box-by-box” solutions as data volumes grow.
5. Aim to reduce OpEx for IT administrators
In addition to the CapEx considerations just described, the total cost of ownership must take into account operating expenditures. This is where the “set and forget” approach really proves its worth. Fully automating the data protection procedures enhances operational efficiencies, and minimizes or eliminates human error. The fact is: there are high hidden costs in any backup strategy that relies on manual and error-prone operating procedures. Time is money and mistakes are costly, and both must be factored into the total cost of ownership equation.
These underlying problems and the resulting hidden costs are typically greatest in the remote offices. IDC explains why: “Historically, remote offices have been treated as standalone islands, without centralized management, policy, or visibility. Limited onsite IT resources, inadequate WAN bandwidth, and administrative-intensive and failure-prone tape hardware resulted in inadequate and infrequent backup operations, thus compromising operational recovery.”
IDC also notes another problem—this time at the data center—that increases the total cost of ownership when using tape: “Older data written in a legacy backup application format may need to be restored for legal or business reasons. This can require a firm to maintain the legacy backup product for restores while deploying the new data protection approach for backups on a go forward basis.”
6. Know your bandwidth requirements
Another ongoing operational expenditure involves the WAN services needed at the various sites throughout the hierarchical hub-and-spoke topology. The most important consideration here is how well the solution can reduce the bandwidth needed to replicate the backed up data. Robust deduplication can reduce data volumes by as much as 97 percent, enabling even very large daily data sets to be replicated over the WAN links currently used at data centers and larger facilities.
Determining the bandwidth requirement at smaller remote offices usually involves making a tradeoff between CapEx and OpEx. Having an on-site disk-based storage system enables you to back up data quickly, and then replicate it with relatively modest bandwidths during off-hours when other throughput demands are low or nonexistent. Backing up directly via the WAN, by contrast, may require a higher speed link at a higher ongoing cost.
For the smallest remote offices, a T1 or digital subscriber line link may be sufficient. Depending on the volume of data involved daily and weekly, this level of throughput may suffice for most small offices.
7. Test early and often
While most companies conduct at least annual testing of their DR systems to meet applicable regulatory requirements, many have neither the time nor budget to perform tests frequently enough or in a way that accurately simulates a disaster scenario. For example, these annual tests are sometimes simplified by pre-loading tape libraries with tape cartridges or conducting only spot checks of data volumes chosen at random.
Unfortunately, without complete, realistic testing, weaknesses in the disaster recovery process may not be discovered until a disaster actually occurs. It is vitally important for the DR assurance program to include routine and thorough testing of all systems, procedures and processes in as realistic a manner as possible. And the replication of data makes such testing much easier. For example, replicated backup sets can be tested in a simulated recovery environment (independently from the actual hosts and servers so as not to disrupt normal operations) to ensure backups are being performed as expected.
An automated “set and forget” strategy is capable of satisfying the most demanding backup and recovery needs. By using disk-based systems, there are no tapes to load or unload, or to mislabel or misplace, thereby minimizing human error. Replication across the hierarchical hub-and-spoke topology is also automatic, affording constant protection, even against widespread disasters. And the ability to test rigorously and regularly eliminates any flaws before that dreaded day when a disaster actually occurs.
Simply put: The best time to make sure you are ready for the unexpected is today. By investing in enterprise-class data recovery planning now, you’ll ensure that the time and effort you’ve spent on data back-up pays off for you when you need it to.
About the Author
Dennis Rolland is the director of advanced technology for the office of the CTO at SEPATON, Inc. Rolland oversees the architecture and future direction of SEPATON’s data protection technologies. He brings to SEPATON more than 20 years of storage experience in the areas of hardware and software development, having held senior level engineering management and architecture positions at leading storage technology companies.