Not a week goes by without news of some airline system crisis, data backup failures or power outages taking down a big-brand website. In the first two months of 2017, we saw network outages at United and Delta airlines and an employee at DevOps startup GitLab accidentally deleting an important database. These all bring to mind Code Spaces’ catastrophic public cloud hack case a couple years ago, which resulted in the company going out of business. These unfortunate data disasters are yet more examples of archaic approaches and ill-prepared strategies for IT resilience, disaster recovery and compliance. In all these cases, there was no shortage of data protection solutions in place, yet the IT strategy had some major flaws. The first was not testing the recovery process on an ongoing basis. Another is, instead of continuous replication, relying on data backups that are 6 to 24 hours old, which result in user data being lost even in a successful recovery. Lastly, leaving all data with a single provider, proverbially putting all your eggs in one basket, only multiplies the risk – but they risk more than just business information.
When outages occur, companies are putting themselves in very detrimental situations with loss of revenue, customer loyalty and corporate brand. According to a 2016 Harris Poll the most damaging scenario to a company’s reputation, after lying, was the risk of security and data breach. Whether they’re creating a DR strategy for the first time or updating an existing one, C-level leaders are now realizing the need for IT resilience, which means enabling businesses to keep moving forward through any IT disaster whether from human error, criminal activity such as ransomware or true natural calamity. The reality that GitLabs, Code Spaces, major airlines and many other organizations face is that they’re over confident in what they believe to be IT resilience but have under invested on disaster recovery planning and preparation. Following key lessons from these well-documented IT failures can help business leaders ensure their DR plan is robust and allows true recovery of their data – and their brand – to take place.
1. Complete recovery requires testing, testing, testing
Virtualization and cloud-based advancements have actually made DR quite simple and more affordable. But it doesn’t stop there: organizations need to commit to testing disaster recovery plans consistently, or else the entire strategy is useless.
This is why the FBI issued guidance in "Ransomware Prevention and Response for CISOs" that urged organizations to “verify the integrity of those backups and test and test the restoration process to ensure it is working.” The strategy must include being able to quickly and as completely as possible recover critical data using proper tools and processes. Before performing a live failover on a production environment, IT admins should run a test failover to ensure user access is set-up and configured ahead of time to test access and look for possible issues before bringing down the production environment. It may also be useful to perform a live failover on test servers or environments to get a good handle on the process.
Essentially, the DR site at this point is a separate copy of your live production environment in a “sandbox” test network to prevent any communication to the public network or your production environment.
Traditional backup is fine, but enterprises don’t want to restore operations to how they were yesterday. It’s not good enough and results in significant revenue loss. Additionally, it is critical to implement and successfully test a rigorous business continuity and disaster recovery strategy that does not rely on the tribal knowledge of individuals required for recovery and can support multiple virtualization, hardware and cloud platforms for flexibility. The C-suite needs to incorporate automated failover and recovery technology with minimal data loss for true IT resilience. Non-disruptive DR testing would have allowed for a full “dry-run” for DR preparedness.
2. Backup is not disaster recovery
Some companies believe the easiest solution to protect data in a virtual environment is to backup the virtual machines using tools like snapshots or agents. However this can slow down your production environment and is difficult to scale. The most effective approach to a business continuity/disaster recovery solution is continuous, hypervisor-based replication. Enterprises will be able to get long-term data retention and archiving out of their DR solutions, which may render some backup solutions obsolete. Many DR solutions, for example, have backup-like features, including recovering a single file from a point-in-time seconds--not hours--ago, which is more granular than traditional backup. File-level recovery and point-in-time checkpoints could have helped bring back the GitLab database much sooner. If you can recover data from seconds before an accidental data deletion, for up to 30 days, why would you defer to a 12-hour old backup? Or in worse cases an even older one?
3. Hybrid cloud is a safety net
CIOs should consider a hybrid-cloud strategy that gives businesses another firebreak and secondary place for in case of emergency break glass. Instead of storing all their data on-premises or with only one cloud provider, more companies are realizing that adopting a hybrid or multi-cloud approach for something like disaster recovery, with the right partners in place, can actually be simple and affordable while also serving as a great entry point to the cloud. The perceived complication and expense of transitioning to cloud, which previously held many IT organizations back, is now going away.
IT teams working in the cloud find themselves anticipating issues and moving their data and applications before the damage hits. This sort of proactive movement of data is impossible with a traditional datacenter, of course, but for those organizations embracing virtual, cloud-ready IT environment, it is a reality. In case of a hack or outage that strikes without warning, organizations can still react quickly within minutes. Lacking the infrastructure dependencies that prevent easy movement, critical applications can securely live and move between multiple on-premises and cloud environments.
For many companies, what was lost has ranged from obscure “metadata” to mission-critical databases. The relative importance of the data loss is likely to be disputed but what cannot be argued is the need to take stock of the IT strategy as a means to support revenue goals, deliver great service and protect the corporate image. Each time a data center or IT disaster takes over headlines, CIOs and IT professionals everywhere wince. The IT industry cannot continue with manual systems and legacy backup approaches. Hoping for the best is not a strategy. The key to ensuring uninterrupted operations is improving flexibility and accessibility to the data and applications that run the entire industry. Putting more focus on business continuity and disaster recovery capabilities that use and rigorously test cloud-based infrastructures can make companies in any sector more safe, profitable and reliable.
Gil Levonai is the chief marketing officer for Zerto.