The Impact of Primary Storage Data Reduction on DR Strategies
- Published on October 22, 2010
- Written by NATHAN MOFFITT
Over the last few years endusers have begun to warm to the idea of running data reduction technologies like compression and deduplication on their primary storage. With data levels growing and IT leadership looking for ways to control budget, the ability of these technologies to reduce storage and operational costs makes it inevitable that they will be deployed on at least some production application workloads.
In the context of disaster recovery, data reduction on primary can be extremely beneficial: reducing the DR footprint, lowering costs and opening the door to provide DR to more of the overall IT environment. But for this to happen architects will need to carefully consider how they adapt strategies to accommodate primary data reduction.
Key to this is the relative position of reduction and replication in the I/O stack. Based on the major offerings available today there are three typical methods of deployment:
- Appliance-based data reduction with host-based replication
- Appliance-based data reduction with array-based replication
- Array-based data reduction and replication
In this article we will look at each methodology and their impact on DR strategies.
Data Reduction Appliances + Host Replication
In this model the data reduction appliance sits between host and storage (Figure 1). Host based replication continues as usual, and data is “reduced” down-stream by the appliance.
Because replication occurs before data reduction, reduction benefits do not automatically carry over between sites. Instead, data reduction appliances must be deployed at both sites.
In addition to the above pros and cons, it is important to understand if data reduction is performed inline or post-process. Data reduction appliances that have a post-process methodology typically require the appliance to read back, reduce, and then re-write data files to disk in an optimized format.
This will necessitate additional free space and value-added functions like snapshots may need to wait until processing is complete.
Data reduction appliances that have an inline process will complete reduction before data is written to disk enabling the array to perform value added functions as normal. However, there may be a performance impact to the application workload because of the intermediary processing before data is committed to disk. The level of impact will vary based on the type of reduction, usually compression or deduplication, and it’s processing algorithms.
Data Reduction Appliances + Array-Based Replication
In this model replication occurs after the data is written to disk (figure 2). If data reduction occurs inline no change should be necessary to how synchronous and asynchronous replication processes are performed by the array.
Data will arrive at disk “reduced” and the replication engine will have less data to transmit. As with all inline technologies, performance impact should be considered. Since data will now be in a proprietary format, an appliance (or host side software agent) must be running at the target site so data can be read.
As mentioned above, post-process data reduction is more complicated. Because data will need to be read-back and reduced, array-level replication should wait until after the post-processing work is done. This will typically rule out the use of synchronous replication, and asynch may need to be delayed (when snapshot based) or potentially avoided (when I/O based).
Array-Based Data Reduction and Replication
Rather than leveraging external appliances, this approach uses the storage array to perform both data reduction and replication (figure 3). This simplifies the implementation, because a single device is deployed at each site. It should also enable increased automation of data reduction, replication, and value added services by bringing them under a single management framework.
If the data reduction is done inline on the primary system, all processes should occur as normal with reduction benefits automatically carried over to the DR system. The only thing to be careful of will be the performance impact on application workload during inline processing.
If the data reduction is accomplished post-process, the processing impact can be deferred, but replication may need to wait until data reduction completes or data reduction may need to be “bypassed.” Depending on the vendor implementation, these issues may be avoided by enabling each array to independently perform data reduction or allowing the primary array to “own” the reduction process on both primary and DR storage.
The increase in primary storage data reduction solutions offers storage architects and DR planners the ability to extend primary storage and operational savings into DR reducing:
- DR storage costs
- DR power, cooling, and space consumption
- Network costs between data centers
This enables budget relief as well as the opportunity to provide DR for a larger portion of the IT environment. That said, care should be taken to understand how these solutions will fit into the DR process, and if they can extend data reduction benefits from the primary system to your DR environment without adding complexity.
Finally, one should always consider the “baseline” capabilities of solutions including: interface support, data reduction methodology and benefits, application support, and the total amount of gear that must be deployed to make the solution work acceptably. Without the proper baseline capabilities, even the most advanced solution may offer limited value.
Nathan Moffitt is senior manager of data protection solutions at NetApp. For the last 15 years he has worked in a variety of IT roles consulting on, deploying, and supporting tape and disk-based data protection solutions.