It should come as no surprise, the explosion of digital information is real and continuing to grow at a much faster pace than our ability to capture, preserve and use it effectively. Analyst firm IDC predicts that by 2015, the world will generate and store 8,000 exabytes of digital information (see Figure 1). Growing at a rate of 10 times every five years, this figure will reach 80,000 exabytes by the end of the decade.
Figure 1: IDC's Digital Universe Study, June 2011
What's fueling this explosion? A majority of it's being driven by individuals creating content by using mobile devices like smart phones, tablets and cameras. In 2010, it was estimated that there were over five billion mobile phones in use, 12 percent of which are smart phones capable of generating increasing amounts of digital content and growing by 20 percent each year (McKinsey Global Institute – Big Data: The Next Frontier for Innovation, Competition and Productivity, May 2011). This explosion is also being driven by information created around individuals as they go about their daily lives. Examples of this include surveillance videos, motion sensors tracking location and traffic/inventory movement.
Although 75 percent of the digital information created in the world is generated by individuals, enterprises have liability and responsibility for 80 percent of this information (IDC Extracting Value from Chaos, June 2011). This means that organizations have the responsibility to archive, deliver and maintain information technology systems, data storage systems and disaster recovery plans. The major challenge with this is that most organizations are asked to accommodate these ever-growing amounts of information with the same or even decreased budgets and resources. Technology advances to some degree are helping with this challenge. For instance, computing is getting faster and cheaper. Virtualization is driving up efficiency and utilization. Storage devices are growing in terms of capacity while reducing in price (more bits per device at a lower cost) and getting faster with the advent of solid-state technologies (although not currently at a suitable price point for all workloads). Then there's the cloud, which is shaping up to be a key option for helping lower costs and drive efficiencies.
Challenges of Big Data
Now that we have established that Big Data is a present and growing part of many organizations, the question is, "How do we handle this vast amount of data?" More specifically, "How should this data be protected and made available in the event of a disaster?"
Consider an example in which a company has one petabyte of data that is used for production business applications. This company has a few options to ensure that their data is constantly available in the event of a local disaster:
- They could make multiple copies and store these at an offsite location
- They could make multiple copies at multiple different physical locations
- They could use cloud storage to store another copy of the primary data.
The key idea to consider is that cloud storage is the only option which offers the potential to safeguard Big Data and dramatically reduce the restoration time, while minimizing the overall expense and management aspects of disaster recovery.
Is the Cloud Viable for Big Data Disaster Recovery?
The cloud offers a great benefit of elasticity since companies can use and pay only for what is needed at any given time. Tying elasticity in with disaster recovery plans presents an ideal option to store additional copies of business critical data for recovery and restoration in the event of a catastrophic occurrence.
According to many recent surveys, including one conducted by Aberdeen Group in 2010, disaster recovery using cloud storage is actually the number one use case (see Figure 2).
Figure 2: Aberdeen Group Survey, 2010
Couple the survey by Aberdeen Group with another survey conducted by CIO Market Pulse in early 2011 which claims that 42 percent of CIO's describe their organization's data management practices—which includes disaster recovery—as "outdated", while 31 percent describe them as "chaotic" (see Figure 3).
Figure 3: CIO Market Pulse, 2011
These statistics don't paint a great picture considering that the intent of a disaster recovery plan is to prevent a debilitating business and/or financial event from happening. However, what's interesting is the trend of businesses using the cloud as "part" of their strategy around disaster recovery. According to the same 2010 Aberdeen Group survey mid-sized businesses, at 48 percent, are the greatest adopters of cloud storage as part of their disaster recovery plans in comparison to 38 percent for small businesses and 26 percent for large businesses (see Figure 4).
Figure 4: Aberdeen Group Survey, 2010
The survey continues to state that along with the largest adoption percentage of cloud storage for disaster recovery, mid-sized businesses also experience the shortest recovery time of four hours in comparison to 7.7 hours for large businesses (see Figure 5).
Figure 5: Aberdeen Group Survey, 2010
So is the takeaway here that all companies should use cloud storage as part of their disaster recovery plan? Unfortunately the answer is the ever-dreaded, "It depends."
Pros of Big Data Disaster Recovery in the Cloud
Now that we have defined Big Data and have an understanding of how it relates to cloud storage, let's explore how cloud storage can be both advantageous and more cost-effective for disaster recovery than local storage. Two of the key advantages include recovery times and multi-site availability, ensuring that a business's Big Data is not lost, compromised or unavailable. These are all top requirements for an effective disaster recovery solution and are at a fraction of the cost of more traditional disaster recovery techniques. Cloud storage can provide these advantages because of its inherent scalability, elasticity and rapid deployment. Furthermore, cloud storage supports the ability to dynamically fine tune the cost and performance of the disaster recovery plan according to business criticality of the application. With conventional disaster recovery methods, this idea becomes extremely challenging since it appraises all applications as business critical, thereby dramatically increasing costs unnecessarily. Cloud storage has effectively shifted the disaster recovery tradeoff paradigm curve to the left as represented in Figure 6.
Cons of Big Data Disaster Recovery in the Cloud
There are important considerations that need to be taken into account before embarking on creating a successful disaster recovery strategy for Big Data in the cloud. One of the key elements in designing a Big Data disaster recovery strategy that uses the cloud as the repository is to ensure that there's adequate network bandwidth available to support replication. This can be challenging, especially when working with a cloud storage provider that may or may not support high-speed connectivity such as 10 GbE. Another consideration is the data itself—keeping in mind that not all Big Data should be treated the same, but rather there should be a discipline as to which data is considered truly business critical, as compared with which data may be less significant. Because the cloud promises better scalability, elasticity and availability than other storage techniques, the recommendation would be to focus the business critical data to the cloud, as counter-intuitive as this may seem. Finally, a business needs to pay close attention to their cloud storage provider and associated service-level-agreements (SLAs) to protect themselves against the possibility of changing terms and conditions or a more calamitous event such as the provider going out of business.
Can Big Data, Cloud Storage and Disaster Recovery Work?
As we come to the conclusion of this article, we should ask ourselves, "Is it really possible to use cloud storage as part of a disaster recovery strategy for Big Data?" The answer quite simply is yes, as long as the requirements are adequately quantified, the challenges effectively addressed and the business has a willingness to adopt the latest technologies. Cloud storage has the capability of truly revolutionizing disaster recovery for businesses ranging from SMBs to large business. With Big Data it all comes down to aligning the value of the data with the cost of protecting it. With the advent of Big Data and cloud storage, comes big opportunities and businesses simply need to adapt their disaster recovery strategies to take advantage of these innovations.
Chad Thibodeau is the director of product management and alliances at Cleversafe, Inc.—www.cleversafe.com — a company that has created a breakthrough technology to solve Big Data challenges within the storage industry. Thibodeau is also a co-chair for the Cloud Archive and Long Term Preservation group within SNIA (Storage Networking Industry Association) that focuses on the challenges of archiving and preserving digital data within public cloud storage providers.