Exponential data growth is nothing new to today’s large enterprise data centers. For the past several years, data has grown at unprecedented rates, pushing IT managers to find new, more efficient ways to move, manage, and protect their data. However, with the increased adoption of very large databases and the advent of Big Data technologies, this already extraordinary growth rate is pushing backup and disaster recovery systems to a critical point. We are beginning to see the effects of this trend in the dramatic increase in data center “sprawl”, increased difficulty in meeting backup windows and replication windows, and greater emphasis on driving data center cost-savings and efficiency. A recent vendor survey – the Enterprise Data Protection Index 2012 reveals the challenges and priorities for disaster recovery in large enterprises and big data environments.
According to the survey, data growth is unabated. Thirty-three percent of respondents reporting that their data was growing at 20-30 percent annually and an additional 20 percent reporting even higher annual growth rates. Respondents also reported a marked increase in data growth compared to last year. Nearly one-quarter of respondents reported a 25 percent higher growth rate compared to last year.
More options for DR
In the past few years, we are seeing disaster recovery strategies move from an almost universal use of physical tape libraries to a much more mixed use of technology. While making copies to physical tape and shipping them off-site is still in use at 18 percent of companies, more and more companies are also using disk-based backup and electronic replication for DR. Nearly half (47 percent) of respondents are replicating more than 50 percent of their data to remote location for DR protection.
With this move to disk-based disaster recovery, companies are also moving to active-active strategies for improved RTO/RPO. According to the survey, 21 percent have an active-active remote replication strategy in place and 41 percent have an active-passive replication strategy.
Solutions Lead to New Problems
In many cases, enterprises have tried to solve the challenges of disaster recovery by replacing physical tape with single-node disk-based backup appliances. These systems – which were designed for medium-sized companies -- solved some of the problems of physical tape. They sped up backup performance and improved reliability by eliminating tape drive failures, and reduced capacity requirements through inline deduplication. Unfortunately, large enterprises and big data environments have data volumes that these systems cannot handle efficiently. These systems force companies to buy a new, independent system every time they need more capacity or performance, resulting in costly data center sprawl. Inline, hash-based deduplication in these systems are also problematic for large data volumes as they slow backup and restore performance and typically provide poor capacity reduction in large database environments.
With data divided among multiple “siloes” of storage, the disaster recovery schema for many large enterprises quickly becomes complex, costly, and prone to human error. Fifty percent of survey respondents characterize their environments as having “moderate” or “severe” sprawl requiring them to routinely add data protection systems to scale performance or capacity. These systems also use a hash-based, inline deduplication technology that is quickly overwhelmed by large data volumes.
Scalable data backup and disaster protection solutions are more efficient and cost-effective for large, complex environments in an efficient centralized way. These systems allow IT managers to add capacity and performance as their needs grow – enabling companies to protect petabytes of data in a single system. They are also built to deduplicate massive volumes of files and database data (a problematic data type for hash-based deduplication) without slowing backup or replication performance.
Remote Office Disaster Protection
Enterprises still struggle to find efficient ways to protect data in remote offices and branch locations. DR strategies for these locations vary widely. At the high end of the DR spectrum are hub-spoke topologies where data in remote offices is backed up to a small disk-based virtual tape library and replicated to a centralized backup system in a central data center, thence to a remote data center for DR. At the low-end, there is no formal DR strategy for remote offices and data is left unprotected. In fact 15 percent of data in remote offices and 11 percent of data in main data centers are currently not backed up or protected. In addition, a full 17 percent of respondents are still either working without a disaster recovery strategy or are in the process of implementing one.
While many smaller organizations have made marked improvement in their disaster protection strategies, disaster recovery is still evolving in the large enterprise data centers. TAs data volumes continue to climb, these organizations will need to move to scalable, enterprise-class technologies that can meet their needs for efficient backup, restore, and replication without causing costly sprawl.
Peter Quirk is a director of product management at Sepaton. He has spent most of his career working for vendors in systems engineering, product marketing, product management and project management roles, with responsibilities in operating systems, databases, languages, hardware platforms, storage, and social media. In his spare time he likes to code and explore the world of Big Data and all things related to Hadoop.