However, the distances between data centers can pose challenges to the techniques used to replicate data. The IT team is faced with the need to perform rigorous and realistic testing of DR/BC technologies to verify that the system will anticipate and accommodate these distance effects.
For example, a major transportation manufacturer in the U.S. Midwest considered a backup site 300 miles from the primary manufacturing plant for data replication. During planning and implementation of the DR/BC process, a test lab in the primary manufacturing plant reproduced the planned system, including duplicating the effects of the WAN through the use of a network emulator. In the design stage the manufacturer was able to discover and correct configuration and equipment issues affecting data replication, avoiding delays in live testing and deployment of the DR/BC solution.
This article discusses the performance consequences of distance impairments, and describes how real-world network DR testing authentically replicates the load, delays, bandwidth contention, errors and other impairments found in the networks. By emulating these conditions in the test lab, IT teams can predict how the production system will work under actual disaster conditions. The results can be used to tune and verify the performance of DR systems prior to deployment.
Delay: The Effects of Distance and Impairments
Local area networks don’t usually introduce a significant amount of delay or impairments. However, the data carried on the WAN that connects data centers can experience a variety of impairments.
- Delay. There are several sources of delay in a network. Propagation delay, the amount of time it takes to physically propagate the bits through the wire or fiber to the remote system, is fixed and predictable. Other types of delay tend to be unpredictable. Reliable transport protocols that require acknowledgement, such as TCP, can be a source of a significant amount of additional variable delay as distance increases. Buffers introduce a dynamic amount of delay as they fill and then empty. Various queuing methods, another source of dynamic delay, are used in a variety of applications, including congestion control, quality of service algorithms, and routing implementations.
- Errors and Packet Loss. Faulty connectors, components with intermittent problems or on the brink of failure, electromagnetic interference, and disruption of underground cables all contribute to errors in the network. Errors result in dropped packets, which result in re-transmission of packets, which results in more application-level delay.
- Packet Reorder. On a packet-based network with diverse routes, packets may not arrive at the destination in the same order that they were transmitted. If a mis-ordered packet arrives in time, a receive buffer can correct order issues before transmitting, at the expense of introducing dynamic delay. If a mis-ordered packet doesn’t arrive in time, the packets after the gap are dropped, causing a re-transmission of packets, introducing even more delay.
Delay and DR Applications
At the application level, the final effect of distance and all impairments is delay, which can have a significant effect on data replication.
Synchronous data replication verifies that the data is written to the remote disk and the local disk before an acknowledgement is sent back to the application. Synchronous data replication solutions are used for high-availability applications like financial transactions, and require minimum delay and packet re-order.
Synchronous data replication is very sensitive to delay. Thorough testing and tuning is required before deployment to assure a synchronous data replication solution will not affect performance and productivity.
Asynchronous data replication is more tolerant of, but not immune to, the effects of network delay. Long delays can result in timeouts and retransmissions, which can unacceptably increase synchronizations windows and the possibility of failure to synchronize the data.
One sometimes-overlooked aspect of data replication is the disparity of delay between the local and remote storage systems after a failover event. How will the application react when it suddenly encounters an increase in delay measured in milliseconds from a single digit to double or triple digits? Testing critical enterprise applications under WAN conditions is an essential component of any disaster recover plan.
Simulating Distance Effects in The Test Lab
Data synchronization must work under normal and borderline conditions. After a failover, applications must work while running against the backup system, often in a remote location. Because of the importance of these systems to business continuity and financial viability, DR/BC planners must be able to demonstrate acceptable performance under normal and disaster conditions. This level of confidence is accomplished through network emulation.
A network emulator is used to create the effects of the WAN in a test lab. It can create a user-configurable amount of delay, from seconds to microseconds. It can also be configured to introduce packet loss, duplication, modification, reorder, and errors.
Software-based emulators are less expensive, but offer less throughput, precision and accuracy. Hardware-based emulators provide line-rate throughput and greater granularity in configuration.
A network emulator offers many advantages over using a live network to test DR/BC implementations:
- Cost-effective. Network emulation is the most cost-effective way to bring the effects of a WAN into a test lab. The alternatives are to schedule off-hours on the production network or to provision a separate test WAN. Using the production network limits the productivity of the DR/BC team by constraining them to small test windows during inconvenient hours. Provisioning a separate test WAN involves up-front non-recurring costs and monthly recurring costs.
- Deterministic. For testing to have meaningful results, the test conditions and stimuli must be known. Conditions on a test WAN are not configurable or repeatable. To know how your solution will behave under 1 percent packet loss, you must be able to consistently and accurately create 1 percent packet loss on the test network.
- Network emulators can be configured to reproduce any typical or atypical network condition with accuracy and precision. A hardware-based emulator guarantees line-rate throughput under any condition, allowing emulation of even the most extreme conditions for testing corner cases and performance limits, where failures are more likely to emerge. Deterministic network emulation also allows a DR/BC team to precisely define bandwidth and network infrastructure requirements for delivering a specific level of quality.
- Automated and repeatable. Unlike a live network, which cannot be configured, much less automated, many network emulators support automation. Automated testing frees engineers from tedious, and consequently error-prone, tasks. Test automation enables unattended testing and round-the-clock use of test labs without requiring unpopular shift work schedules.
- Suitable for demos. A network emulator can provide the deterministic and repeatable network conditions required to stage demonstrations of the DR/BC implementation for management and auditors. To reduce the likelihood of problems during the demo, automation can be used to avoid configuration errors, reducing the risk of technical problems and keeping the focus on the DR/BC plan rather than the logistics of the demo.
Testing the Effects of Delay
A DR/BC test bed includes the system under test, a traffic generator/analyzer, and a network emulator. The test methodology employed depends on the capability under test. Components of a test include user profiles, traffic profiles and network profiles.
- User profiles. Users show up about the same time and log in. Then they do their work in various ways. Data entry people create a continuous stream of transactions, the frequency of which are dependent on the time it takes to navigate a screen. Other users may generate transactions in bursts.
- Traffic profiles. Data applications produce traffic that varies in packet size and frequency. Email creates bursts of traffic. Realtime applications like VoIP or IP video produce time-sensitive traffic in a stream. The traffic mix can also change based on time-of-day or other schedule dependent factors.
- Network profiles. The WAN creates time-varying conditions that are linked to a complex set of conditions, influenced by routing table updates, queuing algorithms, buffering, traffic management and policing policies, EMI and other environmental factors. Any given network can have a range of conditions, and therefore will have multiple profiles. The production and backup networks will have separate profiles due to the differences in distance and configuration.
- Realistic configuration of network emulation profiles can be accomplished in two ways. Capture/playback involves capturing the end-to-end packet loss, delay and jitter for each profile and then playing back the conditions dynamically during testing so that the emulator replicates actual conditions on a packet-per-packet basis. Standards-based emulation uses statistical models to create realistic network conditions.
- IP networks don’t present deterministic, periodic disruptions to traffic. Instead, impairments vary over time, presenting problems in bursts as a result of various issues such as route flaps, queue discards, and buffer overruns. The Network Model for Evaluating Multimedia Transmission Performance over Internet Protocol, adopted by the TIA as TIA-921, and by the ITU-T organization as recommendation G.1050, is a time-varying model that emulates the dynamic nature of impairments in an IP network. This model has been adopted by several standards organizations for testing real-time applications and protocols. It is a statistical model based on actual network information obtained from anonymous service providers. It uses an impulse-driven time series model to emulate impairments introduced by each leg of an end-to-end network. The dynamic nature of the emulated conditions reflect the time-varying conditions found on actual production networks.
Tuning and Verification of Performance and Failover
Real-world testing combines user profiles, traffic profiles and network profiles in a variety of ways to simulate a real-world environment.
- Performance benchmark. Performance testing targets metrics for quantifying system performance. In a data replication performance test, transaction response time (TRT) is a valuable metric, the amount of time it takes to process a complete transaction. The acceptable level of performance, known as the service level objective (SLO), is identified during planning. Performance testing is executed across a range of user, traffic and network profiles that represent the range of user behaviors, traffic mix and network conditions found on the production and backup networks. TRT results are compared to the SLO.
- Performance threshold. Threshold testing ramps up one or more variables while maintaining other variables static to determine the failure threshold for a system. The failure threshold is defined as the point at which a metric of interest reaches an unacceptable level. For example, to establish a user threshold, traffic and network profiles are loaded and then the number of users is increased until the TRT falls below the SLO. To establish an impairment threshold, user and traffic profiles are loaded and then the impairment of interest is increased until the TRT falls below the SLO.
- Failover threshold. Failover testing verifies that the production system switches to the backup system when it is supposed to. Failover can be initiated manually or triggered automatically by an application monitoring a metric such as TRT, delay, error rate, or packet loss. Network emulation is used to produce the condition that initiates the failover. In addition, the test can measure the failover time.
Recent technological advances have made it possible to authentically replicate a production network environment in a test lab. Real-world testing uses precision network emulators and a traffic generator/analyzer create realistic and rigorous conditions for testing, tuning and verifying the performance of DR systems and subsystems. In addition to sparing the expense and disruption of testing on the corporate network, real-world network testing in the lab produces the reliable and repeatable results required for compliance with SOX and other industry-specific DR requirements.
Chip Webb is CTO and a co-founder of Anue Systems. Anue Systems is the global leader in network emulation with products and services that help companies evaluate real-world WAN conditions in a pre-deployment test environment. Prior to Anue, Webb was a distinguished member of technical staff in the advanced video and data networking department at Bell Labs. He was a member of the Emmy award-winning team that developed the first ATSC HDTV system and subsequently led the development of the first all-digital 8-VSB demodulator IC for HDTV broadcast. Webb is a recognized expert in the areas of signal integrity, video transmission and high-speed data networking. Webb holds a bachelor’s degree Cum Laude from Rensselaer Polytechnic Institute, a master’s degree from Columbia University in electrical engineering and is a member of Tau Beta Pi. He has co-authored five technical papers and has been granted 12 patents, with four more pending.
"Appeared in DRJ's Summer 2008 Issue"