For the first few years, server consolidation was the key driver of the virtualization boom, as companies could immediately see the cost benefits of streamlining the amount of IT hardware. Today, disaster recovery represents an innovative, yet practical application of virtualization technology’s greatest feature – portability.
Traditionally, disaster recovery is defined as the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-caused disaster or disruption. In many instances, companies rely on redundant infrastructure and systems housed in remote locations, away from their primary datacenters – the “two of everything” model. While an effective strategy and a necessity for many large organizations needing high availability, there is an increasing movement away from this approach toward other solutions, such as partnering with a third-party or implementing new technologies, including virtualization.
Enter a relatively new approach called virtualized DR. This practice facilitates failover from one partition to another partition in the same server, or in another server located in the same data center. Virtualized DR purports to reduce the requirement of “two of everything” by delivering a level of redundancy that can be used for disaster recovery purposes – but this could not be farther from the truth. Virtualized DR, as it is described and discussed by the market today, enhances fault tolerance but does not satisfy most companies’ complex disaster recovery needs.
There is momentum in the industry pointing to virtualized DR as a panacea to resolve every DR challenge, but it is not, particularly when considering the heightened availability implications of virtualized servers and data centers.
As companies move toward virtualization to improve the flexibility and responsiveness of their IT environments, there are several critical challenges that need to be addressed related to the recoverability of these systems in the case of a disruption, including:
- Ensuring proper back-up and protection for virtual machines;
- Properly planning and implementing the infrastructure migration from a physical to a virtual environment; and
- Offering sufficient administration, management and physical datacenter support for virtual machines.
Another challenge, common with many new technologies, is the tendency for people to believe that this latest technology innovation will “save the day” during a potential disaster. The equally important focus on people and planning may get blurred – which can be very dangerous, since the best technology in the world will not save you if your people aren’t prepared.
Ensuring Proper Back-Up and Protection for Virtual Machines
Virtual machines are inherently more critical than their non-virtual counterparts, due to the sheer number of applications and production-level systems they support. This makes virtual machines more vulnerable to individual “choke-points.” A single malfunction of a cooling mechanism, or even a regional brownout caused by a misplaced backhoe, could result in a crash with costly consequences – bringing down dozens of applications.
It is therefore as critical, if not more critical to ensure proper back-up and protection for virtual machines. The process for doing so is quite different from the more traditional methods of backing up non-virtual machines. For instance, the constantly changing nature of business applications and the IT infrastructures supporting them requires the regular capture and storage of virtualized IT blueprints, also known as images. Images are often set up in-house and undocumented, and the challenge of re-staging an image can be so great that in the event of a disaster, an IT department may have to start from scratch – thereby hurting recovery point and recovery time objectives.
A common mistake is to believe that virtualized disaster recovery environments existing within a single datacenter have disaster recovery mechanisms “built in” and therefore do not require remote back-up – whether tape-based or more commonly disk-based replication stored in a separate facility. These single location environments are vulnerable to the same threats of damage and destruction as non-virtualized environments, and in fact, the re-staging of images requires even greater consideration of remote-back up.
Organizations considering a move to virtualized DR need to closely consider the issue of images and ask themselves:
- How often do our virtual images need to be backed up?
- Where will they be backed up to?
- If that resource were to become unavailable due to a localized disaster – do we need a remote resource to pick up the job?
- Is it in our best interest to enlist third-party support, who would have access to our images and be able to remotely install and activate our systems upon notification?
- Finally, so as not to forget the necessary emphasis on people and planning – who in our organization can give this authorization? What is the process for him/her doing so?
The process of educating IT staffs on the proper way to ensure back-up and support for virtual machines can take months and require outside support. Yet, organizations that fail to devote the adequate time, training and support resources may find themselves learning some hard and expensive lessons during either the normal course of business or in the event of a disaster.
Properly Planning and Implementing the Infrastructure Migration from a Physical to a Virtual Environment
Planning and implementing an infrastructure migration from a physical to a virtual environment is no easy task. For example, virtualization places greater emphasis on a comprehensive understanding of interdependencies – that is, how a change to a particular hardware element may impact another infrastructure element and the applications and operating system instances that element supports. With virtualization, the task of taking and maintaining inventory of servers, their application workload roles or identities (including systems and application software tied to each specific server model) becomes more difficult and critical, while the interdependencies multiply and become more complex. Most virtualization initiatives consider the basics – over lapping input/output or scheduling around peaks – but do not consider the importance of availability requirements in the overall design.
Consider two applications running on separate virtual servers housed on the same physical server. One is a critical server which needs a higher level of availability during business hours, while the other is a small application that runs batch jobs overnight infrequently, and only while the first server is not in use. From a workload matching perspective, this may appear to be a good pairing; however, the lesser criticality of the second virtual server places the first at risk. For example, what happens when the infrequent batch exceeds its time window or impacts the back-up cycle on the first? It is easy to concentrate on the return on investment of server consolidation while ignoring the potential impact of not only too many eggs in one basket, but also the wrong mix of eggs.
In a virtualized environment, there is a higher likelihood of failure to recognize all of these interdependencies – including availability and changes over time, not just I/O, CPU and memory – and as a result, a higher risk for unplanned downtime, which can affect multiple applications and become much more costly. Disaster recovery strategies can mitigate these risks by backing up and ensuring availability for virtual machines. As one respondent in an SQL Magazine survey noted, “Interdependence oversights fall into the ‘oops, I forgot’ class of problems and is a reminder that high availability is equal parts technology and human policies and procedures.”
Virtualization requires IT staffs to consider and implement changes to their infrastructures in a more rigorous, exhaustive and well thought-out manner than ever before. And because unrecognized interdependencies may actually increase the impact of unplanned downtime, IT staffs should spend more time understanding and addressing proper back-up and recovery mechanisms when developing implementation plans. Because organizations are often faced with balancing the tactical with the strategic, they find themselves in the middle of a difficult decision – to free up staff for more specialized virtualization training, or to seek support of outside experts.
Offering Sufficient Administration, Management and Physical Datacenter Support for Virtual Machines
Many organizations view server virtualization as a way to lower costs, and believe reducing staff levels and lowering server support ratios will be part of that. However, this is not always the case. Some are finding that with initial installations of virtualization products there is actually an increased need for support because the environment is now more complex.
Virtualization often drives improved total cost of ownership and better use of hardware resources. However, in looking at potential staff savings, hardware support issues like installation and configuration represent just a small percentage of support staff time, which also includes software installation, configuration management, patching, capacity and performance monitoring, problem analysis and resolution, making labor reduction minimal.
Virtualization can serve the double effect of making these routine software-related tasks more complex, requiring higher investments of time and attention from IT staff. Coupled with greater operating system choice enabled through virtualization – which often drives up the number of instances, middleware and applications in need of support – the result may be a distracted and stretched IT staff that fails to pay due attention to important disaster recovery considerations.
Finally, virtual machines can require a certain amount of datacenter fortification, and the associated costs can quickly add up. Simply reducing the number of servers does not necessarily correlate to datacenter cost savings when moving from a more traditional set-up to a virtual environment. For example, virtual servers are being deployed in greater densities, which are placing greater infrastructure loads on data centers. The physical servers, now running multiple applications, are often drawing more power than their non-virtualized counterparts. Many datacenters were simply not designed to support these power requirements – especially considering that the average datacenter is 17 years old, according to IDG Research. Where there is an increased power draw, virtualized servers also require advanced (and often expensive) cooling mechanisms.
In summary, organizations that fail to devote proper administrative, managerial and physical datacenter support to virtual machines may not have full confidence in the availability, reliability and security of these systems.
Virtualization promises to extend to disaster recovery the same benefits it brings to the datacenter – increased capacity utilization, speed, agility and flexibility. However, the process for getting there can be tricky. Virtualized DR does not provide true disaster recovery, but can help enable fault tolerance in the case of a disruption – one that doesn’t impact the function of the entire local datacenter.
Fortunately, new market developments are helping organizations to identify opportunities for virtualization as part of larger disaster recovery strategies and link virtualization initiatives to disaster recovery planning – thereby helping to mitigate risk and optimizing infrastructure performance.
- Back-up and recovery services designed specifically to support virtual environments, by combining back-up and recovery for virtual images, standby operating systems for Windows environments and hot-site support.
- This development has evolved in lockstep with a “back to basics” approach to disaster recovery, meaning that organizations recognize that virtualization technology alone – no matter how advanced – will not be enough to save them in the event of a disaster. Instead, organizations are wrapping more traditional yet essential DR program components, such as people planning, communication and testing, into their virtualized DR strategies. In doing so, organizations are ensuring that relevant employees thoroughly understand their roles and responsibilities in protecting and preserving corporate data residing on virtual machines, and ultimately re-instituting the business;
- Consulting services which help organizations to strategize and move to virtual environments in the most effective manner, and once there, manage and maintain the environment through a comprehensive understanding of interdependencies.
- Managed services and hosting offerings, which allow organizations to experience the full benefits of applications supported by a virtual environment, while the environment itself is maintained at a remote facility. This approach helps to avoid the heavy up-front costs associated with data center investments as well as ongoing administration and management costs, while leveraging economies of scale.
If implemented and managed properly, virtual machines and virtualized DR infrastructures can dramatically shrink recovery point and recovery time objectives. However, if not properly managed, virtualization can introduce significant availability risks that can hamper business resiliency and overall competitive position. By comprehensively addressing the challenges raised in this article, organizations can position themselves to maximize virtualization initiatives, reap the full benefits of virtualization and help ultimately protect and strengthen their businesses.
"Appeared in DRJ's Winter 2008 Issue"