Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

The Next Level of Disaster Recovery

Written by  JOHN LINDEMAN Wednesday, 07 November 2007 13:15
According to a recent Harris Interactive survey of both business and IT executives, tolerance for IT system downtime is rapidly declining – and is now down to five hours or less. Across industries – from manufacturers running extended supply chains and tracking real-time inventory levels, to healthcare enterprises validating patient records, to financial services firms executing trades based on real-time, split-second pricing fluctuations – recovery time objectives (RTOs) and recovery point objectives (RPOs) are shrinking.

According to a recent Harris Interactive survey of both business and IT executives, tolerance for IT system downtime is rapidly declining – and is now down to five hours or less. Across industries – from manufacturers running extended supply chains and tracking real-time inventory levels, to healthcare enterprises validating patient records, to financial services firms executing trades based on real-time, split-second pricing fluctuations – recovery time objectives (RTOs) and recovery point objectives (RPOs) are shrinking.

This is especially true for systems perceived as having the most significant impact on revenue: e-mail, back-office applications, customer service, and telecommunications. Some survey respondents even argued that there simply isn’t a single customer- or revenue-focused application out there that can afford more than five hours of downtime, period.

The ability to guarantee non-stop connections to information is critical, for several reasons:

  • The sheer cost of downtime: Forrester Research estimates the cost of one hour of downtime at $89,000 for airline reservations; $113,000 for home shopping services; and $150,000 for pay-per-view. Of course the costs can be far greater, including loss of repeat business and tarnished company reputation.
  •  Legal ramifications: New federal regulations require organizations to ensure the currency, accessibility, and searchability of their data at any point in time. Lawsuits, audits, and SEC fines may be a harsh reality check for organizations who can’t meet these standards.
  • Strategic competitive advantage: The ability to guarantee a low level of operational risk – including the risk of losses arising from IT system disruptions – enables more productive use of capital resources, and consequently, competitive advantage. 

When asked to give their companies a letter grade in terms of ability to access business-critical information quickly after an unplanned interruption or disaster, more than half of all respondents chose a grade of "B," indicating a perception of good – although not great – levels of preparedness.

What is it going to take to get a better grade? Today’s organizations are taking disaster recovery to the next level by supplementing traditional recovery techniques such as tape recovery and end-user recovery seats (contracting for shared or dedicated, remote workspace where employees can recover operations in the event of a disaster) with new approaches. 

The Next Level of Disaster Recovery

Tape-based recoveries can be a core component of traditional disaster recovery programs. But the time to recover can be anywhere from 12-48 hours depending on the recovery location and how many critical systems need to be rebuilt before applications and data can even be loaded.

End-user recovery seats are another core component, because organizations need to provide a safe place where employees can report in order to get systems, applications, and data back up and running. This will always be a critical program element. But the problem is, one never knows how long it will take for employees to get to these locations, especially given disaster-induced travel difficulties and delays.

Because many types of data and applications can ill afford hours – or even worse, days – of inaccessibility, organizations are supplementing these more traditional modes with advanced techniques that enhance information availability and ultimately protect the business. When planning for the next level of disaster recovery, organizations should consider moving toward a solution that supplements tape back-up and end-user recovery seats with electronic back-up, including vaulting, server replication, and storage replication. 

Vaulting

In February 2007, Johns Hopkins – a Maryland-based organization comprising Johns Hopkins University and Johns Hopkins Hospital – disclosed that it had lost personal data on roughly 52,000 employees and 83,000 patients in a tape mishap. More specifically, nine tapes containing sensitive information – which were dispatched to a contractor for back-up – were never returned to Johns Hopkins. Both the contractor and Johns Hopkins investigated the incident and reportedly determined that the tapes never reached the facility. "It is highly likely that the tapes were mistakenly left by a courier company hired by the contractor at another stop," noted a statement on the Johns Hopkins Web site.

While the tapes were thought to have been incinerated, Johns Hopkins was forced to notify all employees – current and former – as well as all patients, and undertake an exhaustive review of processes and procedures. The Johns Hopkins example illustrates the danger and high costs of manual intervention often associated with tapes.

Vaulting does away with this manual intervention, taking human error out of the equation to enable a more secure and highly reliable form of data back-up and protection. The vaulting technique is based on electronic vaulting software installed on systems, which automatically backs up selected files at scheduled frequencies and times. The software then captures changed information in file or database, which is compressed, encrypted, and transmitted, usually via an IP connection, to a secure remote vaulting facility. While data is typically transferred over the Internet, the vaulting service may utilize a dedicated communications circuit for a higher bandwidth connection to accommodate higher data volumes.

If a customer requires additional assurance that their data is safe and wishes to have two (or more) copies of backup data, they can subscribe to multiple vault locations and have their data sent to both locations. The result is two encrypted copies protected at two separate locations. The stored information is compressed to minimize storage requirements while maintaining safety and integrity.

If and when a disaster or another event threatens a production environment, employees at end-user recovery seats have access to accurate information, updated to the point of the last back-up, readily available to be retrieved, unencrypted, and restored – and to get the business back up and running. The electronic nature of vaulting typically makes for shorter recovery timeframes while also enabling more frequent, less labor-intensive back-ups and therefore shorter RPOs. 

Server Replication

Server replication enables IT staffs to protect and maintain operational continuity for today’s most widely used applications. Most IT staffs are responsible for the servers running one, and probably more, Microsoft Windows applications, like Microsoft Exchange, SQL Server and Oracle, all essential for day-to-day business activities.

Server replication ensures continuity for these applications and data by providing a reliable secondary infrastructure, plus fully automated replication, fail-over, and fail-back processes should an event, planned or unplanned, interrupt a production environment. Operating over any shared or private standard IP network connection, server replication provides seamless configuration, control, and administration of continuous data availability and protection for servers running Microsoft Windows.

Server replication services can monitor requests between designated source and target machines. When a specific number of network requests are missed, it initiates an automatic or manual fail-over, and a designated target machine assumes the identity of the source server. Additionally, server replication can be configured in a many-to-one manner that synchronizes restoration points for critical data and helps avoid complications associated with tape-based data.

Overall benefits of server replication include:

  •   Minimized data loss for critical applications;
  •   Minimized recovery windows for certain applications (for example, databases, e-mail, and file servers) where a 24-hour recovery window may be unacceptable;
  •   Reduced RTO of less than 30 minutes when using fail-over;
  •   Support for data replication over extended distances; and
  •   Reduced chance of regulatory non-compliance and the associated financial penalties.  

A full service corporate law firm, Osler, Hoskin & Harcourt LLP leverages an advanced recovery solution that is a combination of server replication and end-user recovery. Server replication helps Osler, Hoskin & Harcourt LLP to ensure superior reliability for communications systems with RTOs of less than four hours. In fact, server replication has enabled the firm to achieve failover time of less than eight minutes, meaning that this is the maximum timeframe for any delay in communications traffic. Osler, Hoskin & Harcourt views guaranteed availability of its communications systems as a cornerstone of exceptional client service and relationships, and ultimately a driver of client success. Server replication is the key, not just in the event of a potential disaster but during more frequent and mundane events like server upgrades and building moves which have the potential to disrupt communications services. 

Storage Replication

 

In today’s data-dependent age, storage replication represents another quicker, more reliable means of keeping data safe, sound, and up to the minute. Specifically, storage replication entails the host-independent mirroring of critical data, in real-time or near real-time, between source and target storage systems at a secure, remote location.

Back-up storage systems at a remote facility can be linked to remote processors, ensuring fast resumption of processing – without the time delays, complexities, and lost data often associated with more traditional recovery procedures. Storage replication offers considerable benefits including:

  •   Minimizing data loss for mission-critical applications;
  •   Improving data accuracy to within seconds of the last transaction posted;
  •   Enabling synchronous connectivity for distances up to 60 miles and asynchronous for unlimited distances; and
  •   Avoiding unplanned demands on staff for administering and managing data replication.

OpSource, an expert in software as a service (SaaS) delivery, relies on storage replication to help ensure rapid, immediate off-site back-up for its clients’ mission-critical database system files and application data. OpSource clients include Web companies and independent software vendors (ISVs) who leverage the OpSource platform to support and deliver their on-demand, Web-based, software-as-a-service applications. With client reputations and end-users’ businesses on the line, storage replication helps OpSource meet a 100 percent uptime SLA, translating to superior information protection and availability for the extended value chain. 

Around the Corner: Virtualized Disaster Recovery

 

Virtualization is defined as the pooling of various IT resources in a way that masks the physical nature and boundaries of those resources from resource users. The most obvious benefits of virtualization are not just the ability to achieve higher server and overall computing infrastructure utilization, but improved responsiveness and flexibility – since virtual resources can be moved or modified dynamically to reflect changing business needs.

Virtualization promises to extend to disaster recovery the same benefits it brings to the data center – increased capacity utilization, speed, ability, and flexibility. Through the virtual partitioning of hardware to support multiple operating systems, the technique delivers a level of redundancy that can be used for disaster recovery purposes – either on the same hardware or on a different piece of equipment. Known as virtualized disaster recovery, this practice enables local availability within a single data center. If implemented and managed properly, virtualization can be a highly effective form of advanced recovery, dramatically shrinking RPOs and RTOs.

In spite of these potential benefits, virtualized disaster recovery remains a complex initiative requiring careful, ongoing consideration of numerous challenges which, if left unaddressed, can result in some hard and expensive lessons learned.

For example, the criticality of any given server increases in direct proportion to the number of applications it supports. In addition, virtualized servers require meticulous back-up and recovery of blueprints known as images, which clearly show which servers support which applications and data, as well as any interdependencies. Otherwise, a disaster or other disruption may force IT staffs to re-stage the entire virtualization infrastructure from scratch – an arduous, lengthy process which can severely impact recovery time.

Organizations implementing virtualization must closely consider their disaster recovery strategies hand-in-hand, including not just back-up technologies and processes, but people. End-user recovery maintains its importance, since virtualized disaster recovery environments are not immune to events that can damage or destroy primary datacenters.

Conclusion

The case for lower RTOs and RPOs – and the techniques and approaches required to support them – is clear. But to date, business units and IT departments have not seen eye to eye when it comes to how to limit the amount of downtime that follows a disaster.

According to the Harris Interactive, survey, 71 percent of IT respondents identified disaster recovery/business continuity (DRBC) as very important or crucial to business success, versus 49 percent of business respondents. And while more IT executives (66 percent) than business executives (54 percent) feel planning for uninterrupted information availability should be a top priority, IT executives claim they are still not receiving the budgets necessary to achieve the rapidly declining recovery timeframe.

As non-stop access to the most up-to-date, accurate data becomes more critical, IT departments will need to make cohesive business cases to validate new approaches and corresponding investments. In some cases, it’s a matter of connecting the dots – for example, showing how lack of back-up support for a particular Web server may translate to downtime for a critical customer-facing order management application.

In today’s corporate setting, where institutional resistance is often significant, IT managers may need to devote significant time and attention to formulating strong, persuasive ROI cases in order to secure the top-level backing and support needed to ensure IT system, and consequently, business resiliency.

John Lindeman is the vice president of advanced recovery product management for SunGard Availability Services.



"Appeared in DRJ's Summer 2007 Issue"
Login to post comments