Disaster recovery (DR) has been a challenge for IT organizations for decades, with much of that challenge stemming from the cost and complexity of legacy solutions. Since IT operations have (and will continue to) evolve, these legacy solutions often cause more problems than they solve. But there’s good news: the maturity of virtualization and the emergence of the hyper-scale public cloud brings new ways to look at DR that did not exist before.
Before we dive directly into key things to consider for modern, effective disaster recovery, let’s start with defining the type of DR on which we are focusing. After all, DR is often talked about in several different ways. On one hand, organizations employing back-up technologies such as a back-up application and tape or disk media, will refer to the act of simply moving their backups offsite as a form of disaster recovery. While doing this does in fact mean that the data is available in the event of a disastrous event, the only way to “recover” an application is to restore it to some local or remote hardware which can often take days or weeks if that hardware is not readily available.
But the fact is for many applications within organizations, the desired recovery time is measured in minutes or hours, not days or weeks. And that type of disaster recovery has historically involved full-blown secondary data centers with standby redundant infrastructure – which is extremely costly and complex. As a result, many companies stop at backup with off-site data and call it DR.
Moving beyond legacy
Now, for the first time, the megatrends of virtualization and hyper-scale public cloud can help solve these long-term DR challenges – provided IT pros within these organizations do their homework. Here are four key tips to achieving true DR vs. backup:
Tip #1: Don’t assume all disasters begin with a capital “D”
When many of us think of disaster recovery planning, we tend to focus a great deal on the idea of losing an entire data center to a natural disaster or other catastrophic event. But in fact, most outages are partial outages or little “ds.” They are due to a storage array or disk failure, a failed application upgrade, a facility issue (e.g., power loss), leaking pipes, or some other contained outage. And while we need to take into account and plan for the worst-case scenario, we can’t do this at the expense of the more common occurrence.
As a result, it is critical to have a DR plan that allows your primary data center to work in tandem with a secondary site that is running only some of your applications. This means that the networking and core services such as DNS and Active Directory have to be included in the plan so that you can seamlessly fail over a portion of the environment and it can plug into your existing infrastructure. The plan must avoid things like IP address conflict and/or split-brain syndrome.
In this case, you need to ensure that the technology you deploy, or the partner you choose, has the technical ability to accommodate this more common DR scenario and not just the complete loss of a data center.
Tip #2: Set your goals and objectives
It is very likely that you have applications and data with varying levels of priority within your organization. Before assuming technical and cost limitations of any solution you plan to deploy, determine the business requirements. For each key application, how soon do you need it back up and running? Is it instantly within hours? Or are days acceptable? How much data can you afford to lose? None, less than a few hours, or a full day? By doing this exercise first, you will be sure to explore all possible solutions before making unnecessary compromises. You can always make those painful budget concessions later.
Also by clearly setting goals, you will force your business partners within the organization to understand the cost and complexity of their requirements. This will create a more realistic view as to what having an application back instantly means.
Tip #3: Ensure appropriate geographic separation for your application/data instances
Though not always possible, it is important to look at options that ensure your primary and secondary data center footprints are far enough apart that your business can survive a capital “D.” Should something catastrophic occur, having a secondary data center across the street from your primary data center may not provide adequate protection, especially considering the cost. By having footprints geographically dispersed, you can ensure that your critical applications are recoverable, regardless of the scenario.
Tip #4: Do not ignore the megatrends of virtualization and the hyper-scale public cloud
This may be the most important point of all. Virtualization has transformed our data center management, and the hyper-scale public cloud providers are poised to do the same. Many of the advantages provided by both trends lend themselves to solving the DR problem.
In the case of virtualization, it is no longer required to have identical hardware in two places to achieve true disaster recovery. Because virtualization abstracts us from the underlying hardware, it is possible to intelligently leverage infrastructure that may reduce complexity and cost.
Further, many of the tools built into virtualized environments allow solutions to deploy much more extensive automation than was possible with disparate systems on varied hardware. Through the use of APIs and other automation tools, the right DR management solutions can create a seamless management interface to tie together a variety of different underlying servers, networking, and storage into a software-defined data center that can be replicated into a like or even unlike set of technologies.
And now, for the first time, we can pay for IT infrastructure resources as we use them in the hyper-scale public cloud. This is a perfect fit for disaster recovery, which is most often in the “unused” category.
Contrast pay-as-you-go with the decades-old form of disaster recovery that required a dedicated secondary site filled with redundant storage, servers, networking, and software. These assets most often sat around idle until needed by a failure at a primary site, while meanwhile consuming massive amounts of IT budgets and time. A drive failure in your DR site would likely be only slightly less important than in your primary site as you can’t be sure when you will need that secondary array to be at its best.
In some cases, IT departments would develop creative ways to try and get some value out of these secondary assets, such as using them for testing upgrades, etc. But what if your need to use the site for its actual purpose (DR) occurs right in the middle of putting a bleeding edge piece of software on your systems at the DR site?
Now with the hyper-scale public cloud, well-designed DR management solutions can fully exploit the opportunity to only incur cost at a high rate when needed. Because hyper-scale public cloud infrastructure costs are dominated by compute and storage management operations, a smart DR solution can take full advantage of the fact that your DR environment should actually be turned off more than 95 percent of the time. As a result, for the first time, if you currently have a DR site, you can cut your current DR costs dramatically. Or, if true DR has been out of your reach for budget and complexity reasons, there may now be a sensible solution for you to enhance your business’s ability to withstand the interruptions in service that occur.
Marc Crespi is the CEO and co-founder of OneCloud Software where he is responsible for the company’s overall strategy and execution. He has more than 20 years of experience driving product execution and revenue in high growth organizations.