For example, for an average company with 98 percent availability – due to unplanned failures – the computing resources are unavailable 174 hours each year. Using the Gartner average of $42,000 per downtime hour for a mission critical application, an average of more than $7 million is lost each year due to unplanned downtime in this environment.
For companies that rely 100 percent on technology such as online brokers, trading platforms and e-commerce companies, hourly downtime risks can be $1 million or more, making availability an even greater issue.
|(Mission Critical) ||Typical Uptime||Hours Down per Year||Cost per Unplanned Downtime Hour ||Downtime|
|Average|| 98.000% ||174.72||$42,000||$7,338,240|
|Very Good||99.000%||87.36|| $42,000 ||$3,669,120|
| Outstanding || 99.500% ||43.68||$42,000||$1,834,560|
| Best in Class || 99.900% || 8.736 ||$42,000||$366,912|
Typical downtime risks for various availability levels. Note: a 1 percent increase in availability translates into over $3 million in value. Comparing the cost of the disaster recovery plan with the risk mitigation value allows IT manager to make valuable spending decisions and justify additional investments in disaster recovery solutions.
To determine how much disaster recovery spending is enough, IT managers need to perform a three-step analysis:
1) Assess the downtime costs for crucial business systems;
2) Calculate the potential disaster risks and impacts;
3) Compare alternative plans to determine benefits of each proposed solution and how much spending is enough.
This three-step process helps put the risks, possible projects and benefits in perspective. It also helps executives make sound spending decisions and correctly balance disaster recovery solutions with other IT projects.
Determining The Downtime Costs For Key Business Systems
Downtime risks can be calculated by examining each of the business systems and determining how much value they deliver to the organization. Typically, risk is measured per hour of downtime, i.e. the revenue value or productivity impact that the unavailability of the system or data for that hour will have on the organization.
For transaction-based business systems, potential downtime losses can be calculated based on the number of transactions on average during the day, or the number of transactions during the busiest hour and the average value of the transaction. Multiplying these two figures will provide a good idea of how much the business system is worth per hour.
For example, an e-commerce system records 1,000 sales transactions per hour at its busiest. On average, each sale is $45. If the system were unavailable due to a disaster, it is estimated that the business would lose $45,000 per hour. If it took five hours to restore the systems, the impact would be $225,000.
Downtime Losses by Application - Typical Loss per Minute of Unplanned Downtime
Financial/Trading - $40,000
Supply Chain - $10,000
ERP - $10,000
CRM - $8,000
E-Commerce - $8,000
E-Business - $8,000
Business Application - $5,000
Database - $5,000
Messaging - $1,000
Infrastructure - $700
Typical Downtime per minute costs for various applications
For internal systems and infrastructure, the ability for users to do their jobs is diminished or prevented when disaster strikes. To calculate the downtime loss impact for internal business systems, revenue per user is often used to calculate potential impacts. For each system, the number of users is multiplied by the revenue per employee, per hour to determine the downtime risk.
For example, a company with a messaging system has servers distributed across the enterprise. The largest server hosts 1,000 users with an estimated $186 in revenue per employee hour ($350,000 in revenue per employee per year). If the messaging system were unavailable for one hour, the revenue risk to the company would be $186,000. If the messaging system were hit with a disaster and it takes five hours to recover, the cost to company is almost $1 million in lost revenue.
A more conservative approach to calculating the downtime impact of internal systems is with user salaries, rather than revenue per employee. However, most disasters will affect revenue, so for disaster recovery projects, revenue per employee should be used for infrastructure downtime calculations.
The longer the system is unavailable, the greater the impact. Not only may the transaction be lost or an important customer issue not resolved, but the recovery delays may cause a permanent loss of the customer or supplier. The system downtime evolves from a single transactional cost to the loss of the customer’s lifetime value or the cost of obtaining a new supplier. As well, longer recoveries may cause irreparable harm to the corporation’s brand image.
These intangible risks are extremely difficult to quantify and are often not included when justifying the allocation of adequate disaster recovery budgets. However, even though they may not be quantified, the intangible benefits should be an important discussion point and element in any disaster recovery business case. The goal is to compare this assessment of the disaster recovery project on par with other IT investment options.
Once downtime per hour is understood, the next step is to determine the disaster risk: What is the potential for a predicted event striking the organization and if an event occurs, how long will it take to recover with today’s disaster recovery plan? A list of potential events should be created and should include typical issues such as system failures, accidental or intentional data destruction, human error, and natural disasters.
Unplanned downtime has many causes, requiring a focus on all aspects of the computing environment, mitigation of risks from natural disasters, as well as processes, procedures, and training to decrease against human error. Because no single cause is dominant, IT organizations need to spend time in many areas in preparation for these disasters. Achieving operational resilience and being ready for rapid disaster recovery can be expensive for many IT organizations. However, not being prepared for such issues can be even more costly.
For each potential business risk, a probability of occurrence is assigned. As well, the time it would take to recover using the current disaster recovery plan is calculated. This creates a table that tallies each potential risk, probability of the risk occurring and potential downtime impact in hours. For each business system or in some cases the total operation, the downtime impact per hour can then be factored, leading to an estimated risk impact. An excerpt from a typical risk assessment table would be as follows:
| Risk Probability Of Occuring In The Next 12 Months || Business Impact (Recovery Hours) ||Systems Effecting Downtime||Loss Per Hour||Annual Risk|
|Accidental Database Corruption 30%||8||E-Commerce Applicatin||$45,000||$108,000|
| Intentional Database Corruption 25% ||8||E-Commerce Application||$45,000||$90,000|
| Fire In Data Center 1% ||8|| All ||1,000,000||$80,000|
System by system, risk by risk, the team can estimate the potential impacts, recovery times and downtime risks, highlighting the most important elements that need to be addressed in the disaster recovery plan.
Compare Alternative Plans, Costs And Benefits
Armed with a thorough understanding of the business impact in the event of a disaster, the next step is to evaluate and select solutions to help mitigate the risks. Each of the solutions has a cost and range of value in reducing the disaster risk. The reduction in the probability of a disaster incident occurring or a reduction in the recovery time is the benefit. Using ROI analysis, the solutions can be analyzed in terms of financial benefit, helping the team to select the correct amount of risk reduction and the project, which delivers the most cost-effective risk-reducing benefits. A simple solution analysis table might look like this, using the accidental and intentional database corruption risks from the risk analysis on the previous page:
|DR Plan||Cost||Current Annual Risk ||Savings ||Risk Reduction (Benefit) ||ROI|
| Faster Recovery Tools ||$50,000 ||$288,000 ||25%||$72,000 ||332% |
|Snapshot||$100,000 ||$288,000 ||65%||$187,200 ||462% |
|Local Redundancy With Failover||$250,000 ||$288,000 ||90%||$259,200 || 211% |
|Remote Redundancy With Failover|| $3,000,000 ||$288,000 || 99% ||$285,120 ||-71% |
In this case, the snapshot solution is providing the greatest return to the organization financially – mitigating a significant amount of the risk, while delivering a cost-effective solution. However, it is important to remember that disaster recovery solutions are not selected on ROI measures alone. The organization may be risk averse to this type of data loss and may have adequate budget to allocate to the risk reduction program. The local redundancy fail-over solution is mitigating almost all of the risk, while still delivering a positive return making it the best investment (even though it has slightly lower ROI than the next best alternative).
Peace of Mind at the Right Price
In today’s information economy, the cost of computing downtime represents a significant issue for enterprises of all shapes and sizes. The impact can be measured in terms of lost revenue, employee productivity and other hard numbers; it can also affect intangible aspects of the business such as customer and partner satisfaction.
The key is to measure the cost of failure in terms of the bottom line: demonstrating how much downtime on critical applications costs the organization in revenue or sales sends a strong message to the executive team about the value of IT investments. Balancing the organization’s tolerance for risk with a hard dollar assessment of the level of mitigation provided by solutions ensures that disaster recovery investments provide the right amount of coverage for the right price.
Tom Pisello is CEO and founder of Alinean, the IT Value Experts. He can be reached at email@example.com or (407) 882-2426 and author of Return on Investment for Information Technology Providers: Using ROI as a Selling and Management Tool.