Virtualization platforms such as those from VMware, Microsoft and Citrix can provide for advances in high availability (HA) for most enterprises. By extending this concept to non-company-owned server systems, cloud solutions can add even more options for server uptime in the event of either a single-server or multi-system disaster. However, HA solutions that focus on immediate recovery of the system state of a failed virtual machine (VM) can lead to problems when the VM definition is just fine, but data is lost, whatever the reason.
Focus on Server Availability
Problems associated with virtual HA solutions can be traced to the reduction of complex physical datacenters into a series of virtual machines running on a far smaller and more manageable number of physical hosts—an ironic and unintended result of the virtual HA system. This leads to a server-centric focus for HA, where the idea of the VM being able to move from host to host becomes the primary objective of the HA solution. While protection of the system state and vital system information are critical to successful HA, the shift of focus from a combined set of physical server and disk hardware to a series of virtual machines unfortunately removes the focus on protection of both system and data devices.
Too many organizations, relying on complex VM failover systems that can flip an instance of a virtualized workload from one physical server to another in seconds, do not fully realize the impact when both physical hosts use the same storage area network (SAN) or another data-storage platform. Frequently, the two virtual system hosts will access the same single copy of the data stored on the disk system. Redundancy and resiliency in modern SAN solutions does minimize the overall risk of the disk platform failing, but these technologies don’t eliminate it entirely. In addition, site-wide disaster scenarios such as fire, flood or long-term power failure can render the entire redundant SAN inert.
Many organizations also focus a great deal of effort on disk-based replication within a single site or between sites as a sole method of data protection. This works well when network infrastructure is sufficient to properly protect the data using these tools and when compatible disk platforms exist at both sites for the purposes of multi-site recovery. The problem comes when these disk-replication systems either do not talk to the virtualization platform or are incompatible with virtual platform operations to provide server HA. This leads to data protection systems that fall out-of-sync with the virtual machines or, worse yet, data systems that cannot be re-attached to the virtual machines at all.
A secondary issue arises when hardware changes or budget constraints do not allow for either the network infrastructure or multiple compatible disk platforms to be put into place. While the virtual systems may be made redundant by backup/restore procedures or by other options, the absence of technology required for data redundancy renders VMs basically incapable of fulfilling their objective.
In both cases, enterprises have found themselves with perfectly functional VM systems, but without data for either applications or end-users to utilize after failover. The end result is that their HA solutions and plans fail, but through no fault of the VM platform or its HA solution set. Often times, there are properly configured and maintained SAN replication systems in place, but they cannot and should not support the entire VM infrastructure for the organization. Failure to plan to protect data for these VMs in addition to the SAN-attached VMs can lead to faults that are just as disastrous as having improper SAN protection.
Budgetary expenses for putting systems onto a SAN that have no need for a SAN-class disk can be staggering, and more than one large organization has begun looking into alternative replication technologies in order to stem the tide of an ever-increasing SAN disk cost structure. Lower-utilization applications and systems can easily be run on a less expensive disk, opening up the budget for more important tasks and minimizing the amount of SAN that needs to be maintained or restructured.
The Solution: Visualize Services Not Servers
VM platforms have made tremendous strides in providing multi-path HA for the virtual servers and the system-state information they contain, but there are few, if any, native solutions to protect the same data relied upon by these systems. A shift in focus from server availability to service availability can both aid in reducing downtime and increase the usability of the virtual platforms themselves. Shifting focus to services allows information technology (IT) staff to visualize the protected platforms as a combination of moving parts instead of fixating on either the tools or data of an operating system (OS). It also allows the business to see the solutions it needs for day-to-day activities not as a single machine, but a combination of front-end and back-end platforms linked to disk systems and other forms of data storage.
This shift in mindset enables the IT staff to seek out ways to protect the critical data that each business requires. They can research and implement both native SAN replication technologies and third-party platforms to provide for data protection in much the same way as is done with VM protection. In some cases, the same set of tools can be used to protect both the VM and the data, but this is not a recommended solution native to the virtualization platform; for example, the site recovery manager (SRM) from VMware requires third party SAN replication to be brought online in order for the solution to function properly.
With today’s focus on both system protection and data protection, a single-server disaster can be averted via multiple HA pathways, such as the native VM tools for host failure and a combination of native and external tools for disk failure. Multi-system and site-wide failures can be mitigated by failing over both disk and VM systems to another site or to a co-location facility and/or cloud provider. The key concept is that both system and data must be protected—in some cases independently—in order for multi-site failover to be successful.
Business Case for Service-level Visualization
The shift in logic from server protection to service protection can also have benefits in the non-IT aspects of the business. As end-users and management begin to recognize the systems they use as services comprised of applications and data resources, they can more easily visualize how virtualization can be implemented for these services. Business units that formerly remained steadfast on only using physical hardware can transition to the virtual solution once they see their resources as collections of applications and disk space instead of just servers. The more resources virtualized, the easier HA becomes.
Another benefit is the reduction of physical hardware that has to be maintained in production and disaster recovery locations. This becomes especially useful when considering single-use servers. That’s because of the number of systems in the average enterprise that sit on a physical server, which runs nothing but that single application. Usually this occurs when either security boundaries or application conflicts require that the system be segregated to its own instance of the operating system. Once the business units understand that these boundaries can still be observed within virtualization technologies, then they can see how a virtual platform is valid for their needs. Now, instead of a series of physical servers, each hosting one application or system, you can have a smaller set of physical servers, each running multiple instances of the operating system for these segregated application platforms.
A Complete Disaster Recovery Plan
Protection of the systems that exist in the virtual world is critical and, for the most part, can be handled by native tools within the platform chosen for virtualization. The problem is that these native tools protect only the virtual server information and not the data that the applications run on those servers require for their operations. Combinations of native and third-party tools can protect both system information and data to multiple failover pathways, locally and remotely.
By visualizing systems as services instead of discreet servers, companies can build a much more complete disaster recovery plan that offers more than just benefits to the IT staff. Business units that see their systems as services (collections of various types of software and disk) are more likely to allow virtualization of their solution sets. This minimizes the physical footprint of the datacenter, allows for economies of scale for protecting systems and data, and provides for an easier pathway to complete disaster recovery planning.
About the Author:
Mike DeNapoli is a solution architect for Vision Solutions, Inc. Vision Solutions is the world’s leading provider of information availability software and services for Windows, Linux, IBM Power Systems and Cloud Computing markets. For more information, please call 718 726-3322 or email firstname.lastname@example.org.