As technology’s strategic importance to the business has grown, so have the costs and risks for disruptions. Today, there is a wide range of disruptions that could take place such as an accident, a malicious attack, or many other reasons like out-of-commission technology that shuts down e-commerce sites, billing and payment systems, supply chains, or even just the loss of Internet access for employees. These disruptions can result in lost company revenue, lower employee productivity, a damaged corporate reputation, and possibly regulatory fines.
Network outages alone cost businesses in the United States $1.7 billion in 2010, according to an estimate by CDW. Not surprisingly, then 27 percent of enterprises say significantly upgrading business continuity and disaster recovery (BC/DR) capabilities is a critical priority while another 41 percent say it is a high priority, according to Forrester Research.
With the stakes so high, chief information officers (CIOs) and other top executives are rethinking their approach to BC/DR. Companies are recognizing they should approach BC/DR in a more strategic and comprehensive way, one that makes their companies stronger even when there is no crisis.
The Concept of Operational Resiliency
A new best practice called operational resiliency is emerging. This concept broadens BC/DR beyond just IT recovery, and even IT resiliency, to overall operational resiliency for a company. Operational resiliency is impossible to obtain without incorporating technology and business processes. This concept requires a fundamental shift in how IT leaders work with other business executives to develop and implement an overall company-wide strategy. The goal of this concept is to make sure the company becomes more resilient, whether they encounter a small or large disruption.
Some Key Drivers
It used to be that a company could just focus on recovering from a disaster. Some companies gambled that the chances of experiencing a disaster were slim and went about business as usual without thinking about BC/DR. That has changed. The corporate world today has grown much more complicated and interconnected, increasing risks and vulnerabilities. The opportunity for disasters has multiplied, from natural events like earthquakes to man-made disruptions such as data security breaches and technology breakdowns.
In addition, the marketplace today is simply less forgiving of failure. There are three key drivers for this change:
- Higher customer expectations. Mobile technology has raised a customer’s expectations for how they communicate with companies. Employees are constantly connected and able to complete transactions, search for information, transfer funds, and execute all different types of business transactions, whenever and wherever. As a result, tolerance for any downtime is rapidly declining, regardless of the circumstances.
- Stronger competition. From trading stocks to buying products to choosing health care providers, the cost of switching from one company to another is lower today. Reliability and customer service remain one of the few areas of sustainable market advantage, and technology is the backbone of both.
- More industry and government regulations. A growing number of regulations and standards, directly or indirectly, require business continuity preparedness and set an expectation for very high uptime. For example, in the financial services industry, trades must be cleared by the end of the business day or companies face stiff penalties.
IT + Operations = Business
Any Business Interruption
CIO + Senior Management Team + Board
Meet regulatory requirements
Essential to business success
Operational Resiliency Within Reach
While companies’ and their stakeholders’ reliance on technology is quickly making operational resiliency a strategic necessity, technology also makes this goal increasingly possible and affordable. Technology capabilities continue to grow exponentially as prices plunge. For example, the cost of basic amounts of data storage was thousands of dollars years ago and now runs just 50 cents. Tools exist to map, analyze, and monitor business processes from start to finish, including suppliers to identify disruptions as they occur. In addition, transformational technologies like cloud computing and virtualization provide companies with the opportunity to become much more flexible and responsive.
Another driver for the operational resiliency concept is the fact that not all business processes and systems should be resilient. If companies can ensure that their key business processes continue, they may be able to increase the recovery time objective (RTO) and recovery point objective (RPO) of other processes and IT systems. This would help the company save money overall. By assessing and prioritizing the operations as a whole, companies can strike a balance to keep costs in check.
Building a Strategy for Operational Resiliency
To achieve operational resiliency, companies should first make operational resiliency a corporate imperative, not just an IT department initiative. This priority should be embedded into the entire technology and business flow, from sourcing to business processes to IT infrastructure and applications. It should serve as a key filter for making business and IT decisions, from choosing suppliers or selecting new servers. To ensure ongoing operational resiliency, companies should create a culture in which IT and business units work together to proactively anticipate, manage and incorporate ever-changing technologies, business requirements, potential risks, data dispersion, growth opportunities, and best practices.
The Three Phases to Operational Resiliency
Phase I: Determining Functional Requirements
Building an operational resiliency culture begins with establishing the desired endpoint — keeping specific critical operations running despite IT outages or other disasters. This effort should include top-level decision-makers who have broad and deep understanding of the company, so it is generally led by the CIO or chief technology officer (CTO) along with a team of senior business leaders.
Critical operations vary from company to company. For example, a hedge fund company may prioritize its systems for reconciling trades while putting its payroll system on the backburner. Each night, a restaurant chain needs to place orders for the next day’s food deliveries and record its daily receipts, but it can delay the ordering of new staff uniforms to another day.
To set its functional requirements, a company should begin with a business impact analysis, which will determine the key processes and how they are interdependent. Priorities should also be set in order to perform those functions. Key considerations when evaluating operations include how much revenue is connected to each process and to what degree a company’s customers will be affected. Keeping ATMs operating and stocked with cash would be a top concern for a bank since customers expect access to their money whenever they want it.
As a next step, companies should map each process and understand the infrastructure and applications that enable it. Then, they would set target service levels, both application uptime and RTOs and RPOs, as well as identify the gap in current capabilities for reaching that target service level. Because of their importance, most ATMs are architected with dual systems and dual data centers, so if one location goes down, there is an automatic switchover to the other. Of course, not every process demands that level of redundancy to prevent downtime.
Phase II: Developing the Strategy
With a firm foundation from the functional requirements phase, a company can next begin to architect a solution. The CIO, CTO, and other business executives leading the effort would develop and evaluate several strategic options based on achieving specific uptimes and their accompanying costs and risks, illustrated with functional diagrams, i.e. build a new data center compared to relying on a cloud provider for backup. Then, the team should make a recommendation to the broader leadership team.
The solutions should not focus on technology uptime, per se, but on productivity or production uptime. In other words, this means business uptime. Many companies running a complex enterprise resource planning software, for example, cannot afford to shut down production for 12 to 24 hours once a month for upgrades or patches, so they invest in two complete environments and switch between them as they take turns shutting down one system every other month for the software updates.
So the solutions would aim to make the technology as resilient as possible, such as building two call centers in case one goes down. Calls can be quickly rerouted to the other. They also will likely include lower-tech steps such as the restaurant phoning in its next-day food order when it cannot order online.
Once it has validated the chosen option with key stakeholders and secured budget approval, the team builds a strategy implementation roadmap.
Phase III: Implementing the Strategy
At this point, the company focuses on building, then executing detailed designs and implementation project plans for operational resiliency. Because of the dynamic nature of business, the team should put in place processes and safeguards that ensure it takes into account and maintains operational resiliency as it makes changes to business processes and IT.
In addition, the company should create an operational recovery plan for IT disasters, including holding recovery exercises to see how well the plan works. However, with the goal of operational resiliency, the traditional approach to DR is no longer adequate.
For example, it is common from an IT perspective to track the recovery time of a disruption beginning with the official declaration, which occurs anywhere from two to 10 hours after the actual interruption takes place. In setting a higher performance bar with operational resiliency, the clock should start ticking when the outage occurs. Starting with the occurrence more accurately reflects the true business continuity gap during a disaster — the difference between business requirements and business capabilities. Occurrence is the measure that customers, employees, and other stakeholders will use.
Moving to Operational Resiliency
With technology and business inextricably linked, an IT crisis today means a business crisis. As a result, companies should take action to become stronger and build operational resiliency so they can remain up and running during any type of crisis. This level of operational performance requires a new mindset and bold action by senior management. It starts with making resiliency a cornerstone of corporate strategy and embedding risk management into all technology and business decisions that a company makes every day.
Michael Croy is the national director of business continuity and disaster recovery solutions for Forsythe. He brings more than 30 years of experience in building, developing, and implementing disaster recovery and business continuity programs to U.S. and global organizations. Croy is responsible for the company’s business continuity offerings, including risk analysis, best practice models for continuity of IT infrastructure (storage, server, and network), and disaster recovery planning, strategy, and management.
David Halford is the practice manager of business continuity and disaster recovery solutions for Forsythe. He leads a team of professional business continuity planners to help customers plan and implement enterprise resiliency and business continuity solutions. Halford has more than 25 years of experience providing a strategic perspective to enterprise risk management and business continuity focused on aligning business requirements with effective, efficient solutions.