Most enterprises and even many SMBs have overcome the initial challenges of virtual technology adoption. This involves determining how to plan and deploy virtual servers to consolidate physical infrastructure and reduce costs. These organizations are now faced with what have been called “Step 2” problems, which are the challenges that start to appear when attempting to operate virtual environments efficiently. These problems are not so simple to solve, requiring newly defined best practices and altered operating procedures to accommodate real differences in the virtual environment.
This article will focus on the data protection “Step 2” challenge. Most organizations protect data in virtual machines (VMs) as if they are working with physical systems. They deploy back-up agents into the VMs and use them to back-up and recover the many individual files from each VM. They often do nothing to protect the VM image itself. This approach is less than ideal: not only is the VM image completely unprotected, which means that it cannot be recovered, but the one time that the “extra” capacity of physical systems was needed was to run back-up jobs. This capacity is no longer present in a consolidated virtual deployment. Back-up jobs routinely meet and exceed virtual server load capacity, slowing backup, and sometimes risking the back-up process itself.
This article will show IT teams how to re-think their data protection and disaster recovery (DR) best practices for virtual servers and start adopting VM imagebased methods. By proactively understanding these “Step 2” challenges and implementing the best practices discussed, IT organizations can reap the full benefits of their expanding virtualization environments.
In talking with enterprise and SMB administration teams over the last six months, it’s clear that small and large virtual deployments alike have moved past virtualization’s “Step 1” problems with virtual technology adoption for physical server consolidation and cost reduction.
At Step 1, administrators are immersed in determining how to plan and deploy virtual servers to consolidate physical infrastructure and reduce costs. This stage was fairly straight-forward, and most organizations received the types of consolidation and footprint benefits that they expected.
Now, however, the virtual administrator is deep into a second stage of virtualization maturity, a “Step 2” if you will. This phase finds enterprises and SMBs faced with the challenge of protecting the data in their virtual environments and developing strong, sustainable strategies for disaster recovery (DR). The problems that arise from data protection and recovery are not so simple to solve.
The primary Step 2 problem is what affects the availability of any organization’s business-critical systems: data protection. It’s no surprise that most organizations still protect data in VMs as if they are working with physical systems. They deploy back-up agents into the VMs and use them to back-up and recover the files from the VM. They do nothing, typically, to protect the VM image itself. This approach is less than ideal, for obvious reasons. First, history has proven that performing complete system recovery for DR scenarios using file agents is difficult and often quite time consuming. What’s more, the shared resource model of virtualization makes scheduling back-up jobs tedious, and in some cases simply impossible. Back-up administrators now need to analyze which host servers VMs reside on, which can change in real-time thanks to live-migration technologies, and ensure only a certain number of back-up jobs run at one time on any host server. If they do not, they risk impacting performance on, in some cases, 30 or more VMs.
Think about how the back-up load impacts virtual machine performance. When back-up agents are individually deployed in every guest OS, then for the entire time that a traditional back-up job is running, the underlying system resources of the virtual server are being tied up. This not only slows down the VM being protected, but slows down every other VM on the virtual server for the entire length of the job. Forget about simultaneous backup jobs running together on different VMs on the same virtual server.
Then there is the question of how back-up data is moved from the virtual server to storage. Typically, in a physical environment, a separate back-up server is attached to each client to move that data. This can still work for virtual servers – but it means that all back-up data must be sent over the business LAN which disrupts the use of that network. To combat this, some organizations may use a separate backup network, which increases the cost and complexity of the infrastructure. In the end: back-up data sent over business networks can cause slow response times for business users. Another option has been to use a consolidated back-up tool, but this requires a SAN and has had some reliability concerns.
Image-based backup is really the only method that makes sense for virtual servers, when it comes to data protection. It reduces the impact of capturing data by making it faster with less business interruption during the process. It fills a critical gap of protecting the full VM image as well as all of the individual files in the image. It speeds recovery and makes recovery more reliable, for individual files as well as for the entire image. It also assists in speeding data transmission; it’s far faster to transmit the whole image than the many individual files that comprise the image.
Backup 2.0: Image-Based Backup and Disaster Recovery Solutions Take Hold
Datacenters have entered the age of virtualization. Server virtualization is mainstream and accelerating. The resulting change in data management is profound and demands new methods for data protection to replace existing processes that cannot keep up with the size and requirements of the new environment.
There are several problems in applying traditional data protection methods to virtual environments.
- Back-up agents are expensive and place a heavy load on virtual server systems, slowing every VM on the system for the duration of the backup.
- Multiple solutions are required to back-up each layer of the traditional system architecture
- Disaster Recovery systems and solutions have been too expensive for practical deployment
- Inflexible methods of complete protection for physical and virtual assets
Back-up 2.0 solutions provide the next generation data protection technology, designed specifically to manage the unique challenges presented by virtual server environments. By leveraging virtualization technology and image-based data management, organizations can implement backup and disaster recovery that is blends simplicity with ease of use.
Image-Based Backup Proves to be a Critical Success Factor in DR Strategy
Here are some typical IT scenarios in which an image-based backup and recovery approach can make all of the difference between success and failure:
- Rapid, efficient backup and recovery of large VM images such as those which exceed 10GB in size.
- Local high availability of VMs by preserving them and restarting them on alternate virtual servers in the environment.
- Disaster recovery through the transmission of VM images to an offsite recovery location. In this case, recovering both the data and applications occur in the single step of launching the replica VMs on new servers.
- Rapid restore of entire servers, especially those with business-critical applications such as CRM, payroll, and accounting systems – in particular when the server crashes just before a critical business reporting event such as payroll transmission. The ease of restore of these systems without requiring separate steps for OS and application recovery makes image-based approaches invaluable in these cases.
- Providing a safety net for rollback of database systems to a point in time just prior to drive re-sizing and other optimization tasks (just in case).
- Insulating end-users and customers from disk flakiness, data corruption, and the troubleshooting steps required to repair, replace, and bring storage environments back to operational readiness.
- Enabling rapid recovery from human errors such as storage provisioning and reallocation by mistakenly deleting LUNs which contain critical data. This situation occurs frequently when virtual servers are under-provisioned, and junior operators attempt to “find” space for backups. This can also happen during infrastructure upgrades and cut-overs in which steps get out of sequence.
Image-Based Backup is More Reliable for Data Handling than Traditional Methods
Image-based backup, when designed well, is a more reliable method for handling data than traditional methods. What IT teams worry about is the integrity of the back-up data copies held in the archive. Creating those copies depends on a continuous, uninterrupted write of the backup data copy. Another concern is the integrity of the application data captured, which must be consistent at a given point in time to be recoverable and usable from that point.
Both problems are alleviated with a well-designed image-based data protection solution. In the first place, a well-designed image-based solution is extremely fast. Overhead is minimized by by-passing the file system to directly read a disk. Empty blocks in the image are skipped. This also creates a smaller backup copy – in a single file – than would be created by the traditional method of creating a backup copy of all of the thousands or millions of individual files that make up the image. Faster means that there is less time for something to go wrong. Smaller means that the backup copy can be transmitted more quickly and stored using fewer blocks, again reducing the exposure that something could go wrong.
In terms of application data consistency, capturing an image which includes all of the application’s data from a single moment in time is the definition of an image-based backup. Therefore, the odds of grabbing data at an inconsistent point are much reduced. Even in the unlikely event that the data is not consistent, it is still possible to recover and restart the application by rolling back to the nearest previous consistency point, which is captured in the backup image.
According to Enterprise Strategy Group’s 2010 IT Spending Survey, in which more than 500 respondents reported on their budgets for investing in virtual technology, image-based data protection is the technology most likely to help improve backup and recovery in virtual environments.
“ESG research found that implementing data protection processes for server virtualization environments is a big pain point for many end-user respondents. These organizations have committed to fixing the issue this year with increased investments in solutions that improve backup and recovery of virtual machines, improving disaster recovery processes, and improving application backup and recovery,” reports Lauren Whitehouse, senior analyst at ESG. “For x86 server virtualization environments, rapid image-level backup with flexible image- or item-level restore can address all of these challenges. It can provide non-disruptive and optimized backup, enable efficient disaster recovery strategies and facilitate improvements in application-specific backup and recovery.”
The Bottom Line
The sooner that IT teams re-think their data protection strategy for virtual servers – and start to adopt VM image-based methods – the sooner that they will overcome the critical Step 2 challenges of protecting the data in their virtual environments and developing strong, sustainable strategies for disaster recovery. Overcoming these challenges will enable IT organizations to fully exploit the full benefits of their expanding virtualization environments.
Jason Mattox is the CTO of the Server Virtualization Group for Quest Software. He has 12 years of experience in IT consulting, focused on consolidation and virtualization from the desktop to the enterprise level. A hands-on technologist, his background led him to design many of the features incorporated into vRanger Pro as well as key features of other virtualization enhancing products offered by Quest Software.