DRJ's Spring 2019

Conference & Exhibit

Attend The #1 BC/DR Event!

Winter Journal

Volume 31, Issue 4

Full Contents Now Available!

In an ideal world, IT managers would have unlimited budgets to ensure that employees, customers, and suppliers always had access to business systems and important information. Indeed, many organizations allocate significant portions of IT spending each year to build up operational resilience. However, organizations that have never been victims of a natural disaster, security threat, or human error struggle every year to justify spending on disaster recovery projects. The prevailing mindset is “the issues have never happened to us … other companies have those issues.” But as we see from news reports daily, these issues can happen to anyone.

Many individuals carry insurance policies for life, health, auto, and home, but spending premiums compete with other expenditures, making the decision to purchase insurance difficult.

The same is true with companies. Disaster recovery spending is insurance against the risks of user downtime, data loss, and business interruption. Although every organization knows they need disaster recovery, deciding how much to spend is the issue.

Today’s IT budgets are under intense financial scrutiny, and IT managers are being asked to do more and more with less and less. In 2000, IT spending peaked at more than $1 trillion in the U.S. and almost $2 trillion worldwide. However, many of these investments did not deliver promised returns.

A recent survey of IT executives indicates that more than 90 percent of all projects now require a return on investment justification. Disaster recovery solutions are competing with new business applications, security solutions, migrations and upgrades, operations, and maintenance and IT cost reduction projects for a share of the diminishing IT budget. Disaster recovery managers must ensure that spending adequately covers unlikely events and important new technology, training and processes are implemented to mitigate and recover quickly from realized internal and external threats. The dilemma then is how much insurance is needed and how much to pay for it.

 

 For example, for an average company with 98 percent availability – due to unplanned failures – the computing resources are unavailable 174 hours each year. Using the Gartner average of $42,000 per downtime hour for a mission critical application, an average of more than $7 million is lost each year due to unplanned downtime in this environment.

For companies that rely 100 percent on technology such as online brokers, trading platforms and e-commerce companies, hourly downtime risks can be $1 million or more, making availability an even greater issue.

Unplanned Downtime

(Mission Critical)
Typical UptimeHours Down per YearCost per Unplanned Downtime Hour
Downtime
Risk
 Average 98.000%
 174.72 $42,000 $7,338,240
 Very Good 99.000% 87.36 $42,000
 $3,669,120
 Outstanding
 99.500%
 43.68 $42,000 $1,834,560
 Best in Class
 99.900%
 8.736
  $42,000 $366,912

Typical downtime risks for various availability levels. Note: a 1 percent increase in availability translates into over $3 million in value. Comparing the cost of the disaster recovery plan with the risk mitigation value allows IT manager to make valuable spending decisions and justify additional investments in disaster recovery solutions.

To determine how much disaster recovery spending is enough, IT managers need to perform a three-step analysis:

1) Assess the downtime costs for crucial business systems;
2) Calculate the potential disaster risks and impacts;
3) Compare alternative plans to determine benefits of each proposed solution and how much spending is enough.

This three-step process helps put the risks, possible projects and benefits in perspective. It also helps executives make sound spending decisions and correctly balance disaster recovery solutions with other IT projects.

Determining The Downtime Costs For Key Business Systems

Downtime risks can be calculated by examining each of the business systems and determining how much value they deliver to the organization. Typically, risk is measured per hour of downtime, i.e. the revenue value or productivity impact that the unavailability of the system or data for that hour will have on the organization.

For transaction-based business systems, potential downtime losses can be calculated based on the number of transactions on average during the day, or the number of transactions during the busiest hour and the average value of the transaction. Multiplying these two figures will provide a good idea of how much the business system is worth per hour.

For example, an e-commerce system records 1,000 sales transactions per hour at its busiest. On average, each sale is $45. If the system were unavailable due to a disaster, it is estimated that the business would lose $45,000 per hour. If it took five hours to restore the systems, the impact would be $225,000.

Downtime Losses by Application - Typical Loss per Minute of Unplanned Downtime

Financial/Trading - $40,000
Supply Chain - $10,000
ERP - $10,000
CRM - $8,000
E-Commerce - $8,000
E-Business - $8,000
Business Application - $5,000
Database - $5,000
Messaging - $1,000
Infrastructure - $700


Typical Downtime per minute costs for various applications

For internal systems and infrastructure, the ability for users to do their jobs is diminished or prevented when disaster strikes. To calculate the downtime loss impact for internal business systems, revenue per user is often used to calculate potential impacts. For each system, the number of users is multiplied by the revenue per employee, per hour to determine the downtime risk.

For example, a company with a messaging system has servers distributed across the enterprise. The largest server hosts 1,000 users with an estimated $186 in revenue per employee hour ($350,000 in revenue per employee per year). If the messaging system were unavailable for one hour, the revenue risk to the company would be $186,000. If the messaging system were hit with a disaster and it takes five hours to recover, the cost to company is almost $1 million in lost revenue.

A more conservative approach to calculating the downtime impact of internal systems is with user salaries, rather than revenue per employee. However, most disasters will affect revenue, so for disaster recovery projects, revenue per employee should be used for infrastructure downtime calculations.

The longer the system is unavailable, the greater the impact. Not only may the transaction be lost or an important customer issue not resolved, but the recovery delays may cause a permanent loss of the customer or supplier. The system downtime evolves from a single transactional cost to the loss of the customer’s lifetime value or the cost of obtaining a new supplier. As well, longer recoveries may cause irreparable harm to the corporation’s brand image.

These intangible risks are extremely difficult to quantify and are often not included when justifying the allocation of adequate disaster recovery budgets. However, even though they may not be quantified, the intangible benefits should be an important discussion point and element in any disaster recovery business case. The goal is to compare this assessment of the disaster recovery project on par with other IT investment options.

Risk Assessment

Once downtime per hour is understood, the next step is to determine the disaster risk: What is the potential for a predicted event striking the organization and if an event occurs, how long will it take to recover with today’s disaster recovery plan? A list of potential events should be created and should include typical issues such as system failures, accidental or intentional data destruction, human error, and natural disasters.

Unplanned downtime has many causes, requiring a focus on all aspects of the computing environment, mitigation of risks from natural disasters, as well as processes, procedures, and training to decrease against human error. Because no single cause is dominant, IT organizations need to spend time in many areas in preparation for these disasters. Achieving operational resilience and being ready for rapid disaster recovery can be expensive for many IT organizations. However, not being prepared for such issues can be even more costly.

For each potential business risk, a probability of occurrence is assigned. As well, the time it would take to recover using the current disaster recovery plan is calculated. This creates a table that tallies each potential risk, probability of the risk occurring and potential downtime impact in hours. For each business system or in some cases the total operation, the downtime impact per hour can then be factored, leading to an estimated risk impact. An excerpt from a typical risk assessment table would be as follows:

 

 Risk Probability Of Occuring In The Next 12 Months 
 Business Impact (Recovery Hours) 
 Systems Effecting Downtime Loss Per Hour       Annual Risk 
 Accidental Database Corruption     30% 8 E-Commerce Applicatin$45,000 $108,000 
 Intentional Database Corruption    25%
 8 E-Commerce Application$45,000  $90,000 
 Fire In Data Center    1%
 8 All
 1,000,000 $80,000 


System by system, risk by risk, the team can estimate the potential impacts, recovery times and downtime risks, highlighting the most important elements that need to be addressed in the disaster recovery plan.

Compare Alternative Plans, Costs And Benefits

Armed with a thorough understanding of the business impact in the event of a disaster, the next step is to evaluate and select solutions to help mitigate the risks. Each of the solutions has a cost and range of value in reducing the disaster risk. The reduction in the probability of a disaster incident occurring or a reduction in the recovery time is the benefit. Using ROI analysis, the solutions can be analyzed in terms of financial benefit, helping the team to select the correct amount of risk reduction and the project, which delivers the most cost-effective risk-reducing benefits. A simple solution analysis table might look like this, using the accidental and intentional database corruption risks from the risk analysis on the previous page:

Database Corruption

 

 DR Plan CostCurrent Annual Risk
Savings
Risk Reduction (Benefit)
ROI
(Three Year)
 Faster Recovery Tools
$50,000
$288,000
 25%$72,000
332%
 Snapshot$100,000
$288,000
 65%$187,200
462%
 Local Redundancy With Failover$250,000
$288,000
 90%$259,200
211%
 Remote Redundancy With Failover $3,000,000
$288,000
 99%
$285,120
-71%

In this case, the snapshot solution is providing the greatest return to the organization financially – mitigating a significant amount of the risk, while delivering a cost-effective solution. However, it is important to remember that disaster recovery solutions are not selected on ROI measures alone. The organization may be risk averse to this type of data loss and may have adequate budget to allocate to the risk reduction program. The local redundancy fail-over solution is mitigating almost all of the risk, while still delivering a positive return making it the best investment (even though it has slightly lower ROI than the next best alternative).

Peace of Mind at the Right Price

In today’s information economy, the cost of computing downtime represents a significant issue for enterprises of all shapes and sizes. The impact can be measured in terms of lost revenue, employee productivity and other hard numbers; it can also affect intangible aspects of the business such as customer and partner satisfaction.

The key is to measure the cost of failure in terms of the bottom line: demonstrating how much downtime on critical applications costs the organization in revenue or sales sends a strong message to the executive team about the value of IT investments. Balancing the organization’s tolerance for risk with a hard dollar assessment of the level of mitigation provided by solutions ensures that disaster recovery investments provide the right amount of coverage for the right price.


Tom Pisello is CEO and founder of Alinean, the IT Value Experts. He can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it. or (407) 882-2426 and author of Return on Investment for Information Technology Providers: Using ROI as a Selling and Management Tool.