One of the hottest topics in today’s IT corridors is the uses and benefits of virtualization technologies. IT professionals everywhere are implementing virtualization for a variety of business needs, driven by opportunities to improve server usage and flexibility, address business continuity and reduce operational costs.
Server rooms across the world are full of machines designed to deliver just one application or service to the business. The ability to consolidate multiple, physical servers onto a single server running multiple, virtual machines is an attractive way of eliminating physical server sprawl and making more efficient use of a much smaller number of physical servers.
Whether it’s a set of printers to support a new business department or a test system for change control, every new requirement has historically spawned a new, physical server. These servers typically ran at very low load levels and may have been used sporadically. Virtualizing such servers delivers immediate benefits by decreasing pressures on energy consumption, server room space, and other environmental and management issues. This is a no-brainer.
In addition to these “traditional” drivers for virtualization, business continuity is bubbling up as another driver. Historically, business continuity has been seen as a big-ticket (and lengthy) project, encompassing the entire organization. Virtualization presents an opportunity to break that belief and deliver real and immediate value as part of server consolidation projects.
Putting Disasters in Context: Where is the Real Risk?
Let’s face it, although the prospect of a real disaster clearly exists, in all actuality a facilities or IT failure is more likely to be the source of a business disaster than a flood, hurricane, earthquake or terrorist attack.
Take e-mail – one day of e-mail downtime caused by a server shutdown because of the failure of a cooling unit in a datacenter can spell disaster for a company in terms of revenue, productivity and of course, reputation. Such failures are all too common occurrences, yet somehow the impact of these types of failures often get overlooked in the context of decision making about business continuity initiatives. High availability of applications balanced with disaster recovery has to be the right consideration.
Historically, granular solutions to IT continuity have floundered because of the sheer cost of addressing individual components. Virtualization changes this, but it’s still important to consider risks, and that doesn’t mean just the risks of down time, it means the risk of choosing appropriate technology to protect applications.
Clearly not all applications are equal in importance. Some may tolerate extended periods of downtime; others may require 24x7 availability. The big advantage of virtualization is that it allows solutions to be put in place that address risks at the right costs and operational resource requirements. Provided the operational requirements are factored in from the start.
Looking at virtualization as being a panacea for business continuity will be a mistake, using it to address specific need is much more likely to be successful.
Blending Physical and Virtual Deployments: Eliminating the Barrier to Entry
Most virtualization business continuity stories start by discussing virtual machine failover between multiple, virtual hosts. An aspect that’s not often discussed is the new/refreshed infrastructure required to achieve this. Host-to-host failover relies on shared storage, which can be vulnerable to failures as well. This model also relies on the premise that the protected applications are also tolerant of running in a virtual world. Many administrators still have doubts about memory, CPU, and I/O requirements in a virtual enviornment. Following the old adage of “if it isn’t broken, don’t fix it,” it’s always worth looking at the risk versus reward of virtualizing applications such as Exchange, BlackBerry servers, and heavily loaded database servers.
Although infrastructure costs and application risks are important, that doesn’t mean that a virtual host cannot immediately provide business continuity for an application like Exchange – quite the opposite in fact.
A single, virtual host running less critical systems can be used to provide an Exchange failover server without risking the existing e-mail service in any way. In the event of a crisis with Exchange, it’s just a question of making sure the failover server is available for e-mail users to connect to and carry on working. Finding a combination of replication, monitoring, and seamless failover software that can manage the process is all that is required. This architecture can work locally for high availability and remotely for disaster recovery. It can even be extended so the virtual host becomes an availability hub supporting multiple mission-critical applications, perhaps on a dedicated virtual host.
De-Risking the Virtual Lifecycle
At some point, organizations will become confident that their virtual infrastructures are ready for even the most demanding applications. The heavy-weight servers are in place, the IT admins are fully skilled, and server room constraints mean it’s vital to eliminate the last few standalone servers.
But there are still a few business continuity challenges to face:
- The possibility of downtime during the migration process must be addressed, as 24/7 availability is a must
- Unforeseen issues post migration must be planned for; a rollback plan is essential
- Application availability must be top-of-mind because of applications that now share resources
Fortunately, by following this approach, at least one of the challenges fades away. The application can already be run in a VM, so it will have been through extensive testing prior to migration. The failover mechanisms in place can be used to switchover with no interruption of service and if, after all that planning and precaution there are still issues; the option is always there to failback to the original system.
One of the clear benefits of virtualized technologies is that it allows companies to increase their hardware utilization, but let’s not ignore some inherent risks in virtual management infrastructures.
The fundamental design approach, running multiple, virtual servers within a single, physical host system inherently introduces new IT risks. By consolidating multiple servers that perform a variety of business-critical functions onto a single host system, availability of that system becomes a significant risk point. Admittedly, there have been huge steps to mitigate these risks with high availability and business continuity products available to protect the physical platforms on which virtualized applications are run.
High availability products ensure that all VMs can be failed over if an entire physical server fails. More granular tools can detect an individual VM failure so it can be restarted, and recent moves toward fault tolerance will be able to keep multiple, virtual machines in step.
These are good solutions to specific physical and operating system issues, but at the end of the day it should all be about the application.
Business Continuity Planning: It’s All About the User
Avoiding angry calls from users who suffer disruption because of virtualization is always a good strategy. The best way for IT to ensure consistent business performance is through the implementation of a solution that focuses on the business need and end user experience. Meeting that need should be part of the virtual deployment planning and drive the selection of virtual infrastructure and extended management tools. This means looking at all aspects of the virtual deployment and the source of outage threats.
The reasons for outages vary and may include everything from data loss, server failure, application failure, or network failure to planned downtime, application performance degradation and corruption, or a complete site outage (disaster). It’s a fact of life that IT outages will happen; therefore, a critical goal should be that when an outage occurs. It should not result in business disruption and downtime. End users should be able to continue operating as if nothing has happened, thus delivering on the promise of consistent business performance.
During virtualization projects, a critical look should be taken at possible failure points and the ability of the management tools to detect such failures.
Virtual Infrastructures: Machine or Application Availability?
Understanding failure points makes it much more straightforward to check whether the implementation architectures and tools meet the demand of true business continuity. Even today, simple tools delivered to complement the hypervisor may take a pretty limited view of the world. For example, tools may categorize VMs as available even if they have a blue-screened operating system. They’ll respond to a ping and are taking up resources, yet clearly the application isn’t working.
Resource schedulers may see excessive demand on physical resources and move a VM to another physical resource, yet fail to spot the problem as a runaway thread in an application. The problem simply has been moved, not addressed. No matter how you look at it, a sick application that is motioned, migrated, or kept in step between VMs is still sick wherever it is.
So, when looking at how extended infrastructure tools work, keep an eye on what these really mean for business continuity, application availability, and the user experience.
Consolidation: A Step Too Far?
Having suffered from server sprawl for years, virtualization gives a great opportunity to take all departmental servers under control in a central location. Life is much easier to manage, and end users can forget about any server management responsibilities. But now, instead of there being a certain amount of resilience exactly because of that physical sprawl, many eggs are in one basket.
Virtualization means applications stored on many different direct, attached devices all now rely on a single SAN. Business continuity best practice demands there is no single point of failure, yet the virtualization high availability story relies on a single point of failure. There is no choice but to implement a disaster recovery site. With 24x7 availability still on the agenda, another consideration must be whether SAN, or indeed site, recovery can deliver the immediate failover.
Bridging the Business Continuity Gap
Without a doubt, virtualization brings many advantages to the organization. There are tremendous savings to be had in many different areas. It is also without question that delivering high availability or disaster recovery to meet the demands of the modern 24/7 operation means understanding complete risk points and architecting around them. This means combining virtual management tools with experience that comes from disciplines that are as old as the idea of virtualization itself.
Andrew Barnes is senior vice president of corporate development for Neverfail (www.neverfailgroup.com), which provides business continuity and disaster recovery solutions for the mid-market. He joined Neverfail in 2007, bringing extensive experience through his 25 years in the software industry.
"Appeared in DRJ's Spring 2009 Issue"