High Performance Storage in the Cloud
- Published on May 2, 2011
- Written by ERIC THACKER
Until recently, the only alternative has been to buy a bank of disks and use those as your storage target. Unfortunately, even with deduplication, this can be an expensive proposition, both from a purchasing standpoint (with capital expenditures budgets squeezed) and operationally (extensive infrastructure management including power, cooling, and space consumption). An alternative for backup and archiving, as well as for the hosting of applications that has gained considerable traction with IT management, is leveraging the public cloud. While the market for cloud storage is still maturing, it’s rapidly getting to the point where many IT managers may routinely consider it an option for any new or retooled application or data protection.
How could they not, considering the economic advantages? Vendors typically charge on the order of 15 cents per gigabyte per month, where the comparable cost for owning and operating your own storage networks can be anywhere from $1 to $25 per month. Internet services enjoy huge economies of scale from their buying power and the operational processes and redundancies they have developed around managing data reliably on commodity hardware.
However, while the advantages are remarkable, there are plenty of reasons to hesitate to jump into the public cloud.
Security is the first one on everyone’s list. If you are a technology manager who needs to store customer data, or data that is sensitive in any way, you will want answers to a long list of questions about how data is encrypted in transit and on disk, how it will be kept separate from the data of other customers using the same storage service, and how internal access to it can be controlled to prevent exposure to employees without the need to know. You’ll also have a list of questions about availability, service level guarantees, how enforceable those guarantees are, and how to integrate the cloud into your disaster recovery (DR) strategies. Cheap storage is no good to you if you can’t get to your data when you need it.
Despite these obstacles, the economic argument is likely to win out over time. Forrester Research analyst Andrew Reichman recently noted that although 91 percent of enterprises currently have no tangible plans to move forward with cloud storage, they’re likely to change their minds.
“I don’t want to look back in five years and say that I was the analyst that said cloud was never going to happen,” Reichman wrote.
Storage represents one of the most rapidly growing costs and biggest headaches for IT departments, making any cloud service that convincingly demonstrates it can provide that service cheaper and more efficiently on an outsourced basis “resonates strongly with IT buyers,” according to Reichman.
But along with security and availability, organizations that move forward with cloud storage will have to overcome a third obstacle: performance. Without high performance, the usefulness of cloud storage will be severely limited.
Yet achieving high performance with the transmission of large volumes of data over the public Internet is problematic. With any public cloud service, you are likely to be storing data on servers that are geographically distant. In some cases, that may be part of the attraction in terms of disaster recovery and business continuity – to have data available from a remote location if some calamity befalls your business locally. But geographic distance also means data must pass through more switches and relays, slowing transmission and retrieval.
One way to deal with this issue is to categorize your data as active/inactive or performance sensitive/insensitive, as many companies already do when deciding which information to store on tier-one equipment or lower-tiered, less expensive forms of enterprise storage. In general, only about 20 to 30 percent of corporate data is active, performance-sensitive data, meaning that it’s constantly being accessed and modified in real time. The rest is inactive data, stored for occasional use and eventually archived or deleted. So just as this inactive data is a good candidate for less expensive enterprise storage, it’s a good candidate for storage on cloud services.
However, even with data backup and archiving, performance remains an issue. You still don’t want your transmission of archival data to clog the corporate Internet connection that other applications can’t get through. Yes, you can run data back-up jobs overnight or in some other window of opportunity. But you might prefer to be doing continuous replication of data, so you always have a complete copy of all important records off-site. And if you ever need to run an emergency restore of your data from that off-site archive back to your datacenter or to a DR site, you will want to be able to do it as quickly as possible.
To gain the benefits of cloud storage for more active data processing applications, you may want to move both the processing and the data to the cloud. In other words, instead of retrieving all the information needed to perform a data analysis, you may be better off running the analysis on the cloud service and just retrieving the data. Of course, even there, you will want the download of the resulting report to complete as quickly as possible.
In an Enterprise Strategy Group technology brief (“Accelerating Cloud Performance with WAN Optimization,” August 2010), principal analyst Jon Oltsik concludes that high performance will be a key success factor for cloud storage. “Private or public cloud ROI won’t matter if users experience reduced productivity due to unacceptable response time to access files, applications, or their virtual desktops,” writes Oltsik.
WAN (wide area network) optimization solutions have been perfected to speed transmission between geographically dispersed corporate locations over private network, Internet, or satellite links. They work by deduplicating and compressing data, optimizing network protocols, and compensating for latency (delays introduced by network transmission). Usually, this involves placing a network appliance that delivers these services at either end of the connection. Replication and remote file access protocols often have some of the greatest potential for optimization, meaning that bandwidth utilization can be improved by as much as 95 percent.
But Olstik sees the need for “a new type of WAN optimization built with the cloud in mind.”
When you place data in a public cloud service, you no longer control both ends of the connection and may not be able to require that the service provider install a particular WAN optimization device. So cloud storage will tend to favor the use of “virtual appliances” that can be deployed as software only on the cloud side of the network.
Even in private cloud environments – virtualized pools of computing resources, maintained within the firewall, that mimic the cloud style of computing – we’re starting to see this as a trend. The point of delivering a computing service in the form of an appliance is to simplify deployment and enhance performance. Yet some organizations that have been aggressively virtualizing and consolidating their computing infrastructure may look at an appliance as just another “box” to manage. And so they may also prefer loading a virtual appliance as just another application on their private cloud.
The challenges of public cloud storage are more fundamental and will require more innovation to achieve the promise of flexibility and reduced costs, while maintaining high performance and security of data. Eventually, I expect to see the development of hybrid models that combine public and private cloud storage, where data can be moved between the two as easily as it is transferred between enterprise storage arrays today. Such a model allows for tiering of applications and of storage, giving IT management the flexibility to deploy their resources according to strategies that maximize their flexibility and scalability while minimizing bandwidth and storage costs. This structuring of IT infrastructure taps into the benefits of public and private cloud storage while addressing many of the concerns currently associated with each.
If the cloud style of computing becomes as pervasive as some experts think, it will put new stresses on corporate budgets, processes, networks, and the Internet that we can’t even imagine today. But there is little doubt that these challenges can be overcome, just as those of corporate computing on WANs have been. As the cloud wave continues to break over traditional IT processes, customers will demand solutions from IT optimization vendors to capitalize on the compelling advantages of cloud applications and storage.
Eric Thacker is product marketing director at Riverbed Technology, the IT performance company.