Four Common Mistakes to Avoid When Moving Servers
- Published on January 31, 2008
- Written by Mike McClain, Senior Web Designer & Site Manager
But the risk associated with moving servers further away from your end-users should not be ignored. That’s because users who were local to servers become remote users, and the interim stages in a data center relocation may introduce distance between back-end servers.These physical displacements can negatively impact application performance and result in significant business interruption.
In fact, when IT organizations plan server moves, they often focus exclusively on systems issues such as the right-sizing of new servers or virtualization of storage resources. As important as those issues are, it’s a big mistake to ignore the impact of adding distance across the network. If you don’t adequately understand and address the issues that arise when you put more physical distance between users and servers – or between servers and servers – you can set yourself up for serious pain and potential failure.
Here are four common mistakes you should be particularly careful to avoid:
1) Confusing network latency with application latency
When you move servers further away from users, you introduce network latency. That is, the physical distance between users and servers causes a delay in the signal between the two. But adding 50 milliseconds of network delay doesn’t mean that your application response times will only increase by 50 milliseconds. On the contrary, most applications require many back-and-forth interactions between user and server (often referred to as application “turns”) to perform even the most basic tasks. Thus, the addition of just 50 milliseconds of network delay can cause an action that only took three seconds to complete locally a full 30 seconds to complete after a server move.
Unfortunately, this network-related latency is usually regarded as the network manager’s problem, even though the application (including the number of “turns” it requires) may be the real problem. After all, the network manager can’t change the speed of light, or make Tokyo closer to New York. So it doesn’t make sense to lay the problem entirely on him or her. In fact, because application design issues are often responsible for poor response times after a server move, additional investments in the network will be of little use whatsoever.
2) Failing to realize how network latency impacts server
performance and scalability
Many IT organizations don’t fully grasp how the addition of network latency degrades – often substantially – the scalability and performance of application servers. This often-overlooked phenomenon has an adverse impact on the entire user population – not just remote users. It is almost never caught in the QA process, and is rarely diagnosed correctly even when it causes problems in the production environment.
How does network latency affect server performance? The answer is simple. A server allocates resources to each concurrent client session. Local clients complete these sessions quickly because their application turns are subject to minimal network-related delay. Remote sessions, on the other hand, take much longer to complete because each application turn takes so much longer.
It is important to note that servers lock up resources for the duration of the process, and only free them when the process is completed. Thus, when remote users communicate with a server, they keep its resources busy for a longer period of time. This prevents the server from releasing those resources for use by other clients – severely limiting its performance and ability to scale.
Unfortunately, conventional testing and QA typically focus on back-end scalability, with little or no attention given to real-world network latencies. That is why IT organizations are so often surprised when server performance degrades after a data center move.
3) Ignoring business continuity best practices during
interim stages of server relocation
Ideally, an enterprise could pack all of its servers in one weekend, load them on moving trucks, unpack in the new location, and be up and running by Monday. The reality is quite different. Enterprise data centers can consist of dozens or hundreds of servers. It can take weeks or months and multiple relocation steps to complete a move to a new location. Thus, during interim stages, some servers will operate from their original locations while others will operate from the new location. The introduction of this physical distance between servers can seriously impact both business continuity and application performance.
Most business continuity schemes depend on a contingency site, which is provisioned with replicated enterprise data. In most cases, all the data to be replicated comes from a single location: the data center. But, during a data center move, some data sources will reside in the old data center and some will have already moved to the new location. This distribution of data sources complicates disaster recovery and introduces new vulnerabilities to the IT environment.
The physical separation of servers can also have a dramatic and unexpected impact on application performance, because computing processes are almost never designed to accommodate significant inter-server latency.
Any IT organization planning a data center move must therefore ask a variety of questions. Did I adjust my disaster recovery plan to cover interim relocation steps? What happens when servers with critical inter-dependencies are temporarily separated? Which servers must be moved with other servers? When should active directory servers be moved? Which servers will need to be replicated for the duration of the move?
4) Not dealing with users’ performance expectations until after the move Sometimes, it simply doesn’t make sense to set a post-relocation service level objective (SLO) that is identical to what had previously been a local response time. If it took a local user three seconds to execute a task before a server move, it is very unlikely that the task will take the same amount of time after that server is moved across the country. An SLO of seven seconds, for example, may be more reasonable.
That’s why it is critical to directly address users’ service level expectations up front. If you wait until after the move and tell users they just have to live with what you can deliver, you’re setting yourself up for a battle. But if you can get buy-in beforehand as part of the planning process, you can avoid such hassles and ensure that no one has unrealistic expectations.
To achieve this pre-deployment acceptance, two elements are needed. First, IT must have a way of predicting what post-move performance will look like. Second, users must be given a way to experience post-move performance in advance. That is, IT must be able to simulate post-move performance. These predictive and simulation capabilities enable IT to set up “acceptance environments” where users can experience post-move performance first-hand before the move is actually executed.
Seven Steps for Project Success
To avoid making these mistakes, IT organizations must have full visibility into the subtle, complex interactions between applications, networks and infrastructure. Unfortunately, responsibility for these three areas has been split into separate operational “silos.” A siloed approach, however, reduces the likelihood that IT will successfully predict and address the performance problems that can result from a data center move. It is therefore essential to take a new collaborative approach that effectively blends the expertise of the application team, systems managers and network architects. These collaboration best practices are outlined in the following seven-step plan:
- Build a virtual model of the pre- and post-relocation enterprise environment, as well as all planned transitional phases. All participants in the planning process, including business users, need concrete information about how network infrastructure will impact application performance with the new data center.
- Establish an SLO baseline by measuring application performance before the move. Users’ needs and expectations don’t exist in a vacuum. Pre-move transaction response times provide essential context for determining reasonable SLOs for after the move.
- Measure post-move application performance in a virtual environment. The only way to accurately predict the impact of server moves on application performance is to run those applications in a fully simulated post-move environment. This will provide the specific data on potential performance degradations essential for proper planning.
Identify applications that need special performance tuning.
Rather than wasting time, effort, and money on beefing up all elements
of your enterprise infrastructure, focus instead on specific applications
and/or network components that may be particularly problematic.
Analyze problems and validate potential fixes for failing applications. Before investing in and deploying a solution, it’s important to make sure that it actually works.
Assess dependencies between back-end servers to establish a move plan and adjust the DR scheme. This, too, should be done by simulating each planned interim stage of the move – as well as the final post-move environment.
Manage user expectations and get buy-in commitments through hands-on acceptance. Users who merely hear that a transaction response time will go from two seconds to five may object out of sheer reflex – or they may accede without realizing how long five seconds really is.
Business users should therefore be given the opportunity to directly experience post-move application performance in advance so they can offer informed consent to the relocation plan.
By following this seven-step plan, IT organizations can substantially
reduce risk, eliminate unnecessary infrastructure spending, accelerate
time-to-benefit, and overcome a wide range of potential political pitfalls.
The exclusion of any of these steps greatly increases the likelihood
that unforeseen problems will sabotage the project. To ensure the success
of any data center relocation or consolidation initiative, IT must
pool its expertise in cross-disciplinary planning teams and fully leverage
available simulation technologies.
Amichai Lesser is the director of product marketing at Shunra Software. Lesser is responsible for product marketing, market analysis, and field marketing programs and has extensive experience in real-time engineering, performance management, and security. Lesser can be contacted at firstname.lastname@example.org.
"Appeared in DRJ's Summer 2006 Issue"