Impact of Downtime
Although the severity of the negative effects of downtime is quite obvious, the business impact becomes more striking when we are trying to quantify such effects. How do you measure, or rather estimate, downtime? Traditional metrics mainly focus on transaction loss, which can be quite accurately measured for transaction-oriented processes by quantifying the amount of data lost and the scope of rework for data recovery. However, no less important is taking into account the productivity loss. In today’s business world where most companies are dependent on computer systems for their operations, unavailable systems and applications create sharp productivity declines.
Also of growing importance are businesses’ customer support operations, which more and more frequently depend on access to networked applications. Therefore, unavailability of a customer support application will most definitely lead to a slowdown in customer service and potentially disgruntled customers -- the impact of which is quite difficult to quantify. It is equally difficult to quantify the impact on business partners and supply chain management.
It is important to emphasize that total system unavailability is not the only danger. Downtime of critical components causing slow response time can effect a company’s reputation, as customers will quickly look elsewhere for their products and services. For example, with the increasing interdependence between companies in a supply chain, a delay in scheduling may affect not only direct clients, but also their customers and their customers’ customers. Some of the most important but rather difficult factors to quantify are the sales opportunities lost as a result of downtime.
The following table, put together by Contingency Planning Research Inc. of Livingston, NJ, estimates the cost of downtime for different industry sectors. (Figure 1 below)
Causes of Downtime
One of the most common causes of downtime is probably change management, or perhaps more to the point, making modifications without change management.
Lack of proper change management policies or noncompliance with change management procedures oftentimes creates unwarranted downtime. While inherent hardware or software defects are often blamed for network and system failures, in reality, systems more often fail due to misconfiguration or improper modifications as described above. Nevertheless, hardware and software will on occasion fail, the timing of which is, in many cases, unpredictable. Power outages have also seemingly become more and more frequent, as we all witnessed over the last few months.
As stated earlier, downtime is not the only cause of application unavailability; slow response time may also result in poor and often unacceptable service quality, and can sometimes be perceived as downtime.
Due to their location in volatile climates or energy shortage areas, some companies will be completely unable to predict downtime. Disasters will simply force them out of business. Others will be able to maintain operations because of the foresight to set up disaster recovery systems that back up data and in some cases entire systems to remote locations.
And if proper contingency plans can help to prevent or at least minimize the effects of downtime described in the natural and manmade disaster cases above, the only proper way to deal with downtime caused by such catastrophes is a sound disaster recovery plan with off-site contingency provisioning.
Planning for Business Continuity
Proper contingency planning for IT starts with identification of mission-critical applications and related computing systems. During this process it is very important to define the business impact of downtime. Make sure to have well-defined and well-tested step-by-step backup and disaster recovery plans. Such contingency plans should have provisioning for data recovery and data access, as well as alternate locations and offices for personnel. When looking for such locations, consider factors such as ensuring security; routing phone and data access lines; and notifying customers, postal services, distributors, suppliers, and (most importantly) employees of the alternate locations. And, as mentioned above, contingency procedures should not just be identified and planned, but also periodically tested.
In a recent survey of its 1318 members, TechRepublic of Louisville, KY. uncovered quite a disturbing picture of business continuity readiness (or rather the lack thereof). The following figure summarizes responses to their survey: (Figure 2 below)
Most of those surveyed realized the severity of this situation and had different levels of contingency planning in place. The following chart summarizes responses on such measures in the near future. (Figure 3 below)
The activities described in figures 2 and 3 are critical, but proper business continuity planning requires a comprehensive disaster recovery strategy focusing on each and every aspect of high-availability and contingency planning. One way to better ensure that contingency procedures are secure is to outsource to an experienced service provider with high-availability infrastructure, policies and procedures. Qualified vendors offer technical expertise and a physically removed backup center. HomeSource Capital Mortgage Company took advantage of such a vendor. HomeSource, a mortgage banker, is located in the heart of the hurricane belt in Jupiter, Florida. One hundred percent of its business is reliant on next generation technologies using both online and offline tools, making them vulnerable should they experience network downtime. Even a few hours of outage of HomeSource’s mission-critical applications could be devastating to its customers, and potentially disastrous for its business.
HomeSource made the decision to house its IT systems in a remote location to avoid downtime from a major storm or other natural disaster. The mortgage banker selected managed hosting and IT outsourcing services provider Cervalis to manage its critical applications and keep its e-business safe. Cervalis’ IDC, designed with extreme high-availability in mind (see Figure 4 below) is located in Dutchess County, New York - a healthy distance from the frequent rages of Mother Nature. For HomeSource, Cervalis is a safe haven situated away from the hurricane hot spot of the Florida coastline.
Managed hosting providers with N+1 network redundancy and an advanced degree of virtual and physical security offer similar shelter from the hazards so many e-businesses face right now. Power outages, tornadoes, forest fires, floods and hurricanes have jeopardized businesses all over the country this year. But IT services that are managed and protected from the elements by outsourced Internet Data Centers provide reliable connectivity and availability to customers - so their businesses are free to operate at full capacity.
Undeniably, system malfunctioning, or a manmade or natural disaster does not always cause downtime. Breaches in security and deliberate hacks, such as denial of service attacks, can essentially shut systems down, as was recently demonstrated in a number of well-publicized cases. Network security can be protected through a combination of high-availability network architecture and an integrated set of security access control and monitoring mechanisms. Recent well-publicized incidents of Distributed Denial of Service (DDoS) attacks demonstrate the importance of monitoring security and filtering not only incoming traffic, but also the outbound traffic generated within the network. Defining a solid, up-to-date information protection program, with associated access control policies and business recovery procedures, should be the first priority on the agenda of every networked organization. Specifically, a firm’s information security posture - an assessment of the strength and effectiveness of the organizational infrastructure in support of technical security controls - has to be addressed through the following activities:
- Auditing network monitoring and incident response
- Communications management
- Configurations for critical systems: firewalls/air-gaps, DNS, policy servers
- Configuration management practices
- External access requirements and dependencies
- Physical security controls
- Risk management practices
- Security awareness and training for all organization levels
- System maintenance
- System operation procedures and documentation
- Application development and controls
- Authentication controls
- Network architecture and access controls
- Network services and operational coordination
- Security technical policies, practices, and documentation
A sound business continuity plan, including high-availability network design with comprehensive security policies aimed at high availability, recoverability and data integrity establishes the necessary infrastructure to conduct any activities in a secure and reliable fashion, regardless of whether the public Internet, extranets or intranets are being utilized.
Edward Rabinovitch is Vice President of Network Engineering at Cervalis. He is an industry-wide recognized specialist with more than twenty years of experience in information and networking technology, data processing, Internet/intranet/extranet and business communications.
Rabinovitch is a member of the editorial review boards and contributing editor for the IEEE Communications Magazine, Enterprise Systems Journal and The Computer Measurement Group.