Disaster Recovery Methodologies
From an information technology and business recovery standpoint, there are a variety of good disaster recovery methodologies and products in the marketplace. The challenge facing organizations today in the post 9/11 era is to select the optimum blend of disaster recovery products and technologies. A common problem in the past has been a tendency to view the disaster recovery solution as individual product technologies and piece parts. Instead, disaster recovery solutions need to be viewed as a whole, integrated multi-product solution to deal with both the post 9/11 and the traditional business risks and emergencies.
Why Should We Care About Disaster Recovery?
Recent research shows that one major barrier to disaster preparedness is lack of senior management support and funding. Senior management has a tendency to think that disaster recovery planning is a complex time- and resource-consuming process with little obvious benefit. However, insurance statistics indicate that billions of dollars are lost through catastrophes and computer outages. For the 1990-99 period, the Federal Emergency Management Agency (FEMA) spent more than $25.4 billion for declared disasters and emergencies. FEMA is now part of the Department of Homeland Security.
FEMA’s mission is to lead America to prepare for, prevent, respond to, and recover from disasters with a vision of “a nation prepared.” At no time in its history has this vision been more important to the country than in the aftermath of Sept. 11, 2001. Business managers must realize that when a disaster strikes it will have an adverse impact on their businesses/customers. The following are reasons why businesses must plan for disasters:
n Survivability of your business might depend on it – Most companies could not stay in business very long without their mission-critical applications following a computer system failure. Rapid recovery after a disaster or system failure is vital to the livelihood of your business.
- Downtime is costly – A computer system failure, no matter what its gravity, results in increased expenses, lost revenue, and lost customers. A tested business recovery plan ensures faster recovery, and consequently, less downtime.
- Business contracts stipulate delivery deadlines – If you are a supplier, you must deliver the services or products to the other party no matter what your circumstances, or pay the penalties.
- Law may require it – Company officers are legally liable to protect company assets, including electronic data, particularly if you are a public company, a bank, a utility, or a government agency.
In addition to planning for disasters, plans need to be in place to handle non-catastrophic events associated with computer hardware and software malfunctions including lack of adequate security controls. The following is a list of real world events that resulted in risk to company customers and the public associated with computer hardware, software malfunctions, and lack of adequate security controls:
- Dispatch computer glitch grounds Delta flights for two hours in Atlanta
- Florida sues AT&T for billing 1 million non-customers
- Canada’s largest bank has processing disruption
- 800,000 cards overcharged at Wal-Mart stores from hardware problem
- Swedish insurance computers disabled by virus, affecting all 9 million Swedes
- AOL worker sold list of 92 million AOL customers to spammer
- 4.6 million DSL subscribers subjected to data leakage in Japan
- Israeli police lose laptop with critical agent’s information
- Air Force Motorola radios jam garage-door openers in Florida
- Network vandal penetrates South Korean defense system
- LexusNexus Inc. disclosed that hackers commandeered a database and gained access to the personal files of 300,000 people
- Citigroup said UPS lost tapes with sensitive information from 3.9 million customers of CitiFinancial, which provides loans
- CardSystems Solution Inc. security breach could expose 40 million people to fraud
Although these events may not be catastrophic, they are significant and result in considerable costs, resources, and computer downtime.
Business Continuity Management Methodology
I am proposing a business continuity management methodology (a new post 9/11 disaster recovery methodology) that can be used to sort, summarize, and organize company business requirements in a methodical way. According to the Journal of Strategic Information Systems in 1995, business continuity management is regarded as the integration of social and technical systems that together enable effective organizational protection. The following depicts the old approach and new approach to disaster recovery:
Before I define the elements of the proposed business continuity management methodology, I want to introduce you first to (1) disaster recovery statistics and facts, (2) common threats, and (3) effects of a disaster.
Disaster Recovery Statistics and Facts
Research and statistics clearly highlight the importance of disaster recovery and business recovery planning. The following statistics and facts illustrate this point:
- FEMA declared 68 major disasters in 2004 (Source: FEMA, 2005).
- In the August 2003 Eastern North America blackout, 50 million people were left without power and communications. The economic cost was between $7-10 billion (Source: ICF Consulting, 2004).
- Within minutes of the first plane crashing into the World Trade Center in New York’s financial district on Sept. 11, 2001, more than 200 organizations started declaring disasters and invoking their business continuity and disaster recovery plans (Source: Gartner, 2003).
- Two out of five companies that experience a disaster will go out of business in five years (Source: Gartner, 2004).
- Almost half of the companies that lose their data through disaster never re-open, and 90 percent are out of business within two years (Source: University of Texas Center for Research on Information Systems, 2004).
- 43 percent of companies that experience data disasters never reopen, and 29 percent close within two years (Source: McGladrey and Pullen, 2004).
- 80 percent of businesses without a well structured recovery plan are forced to shut down within 12 months of a flood or fire (Source: London Chamber of Commerce and Industry, 2003).
- Globally, 60 percent of 850 mid- to large-sized companies experienced from 1-24 hours of unplanned down time (Source: Veritas, 2003).
Disaster recovery and business recovery planning should be addressed as a top priority in organizations. The consequences of not addressing disaster recovery and business recovery planning are significant to the livelihood of an organization.
Since 9/11, we have lived through a number of catastrophic events (the tsunami, Hurricane Ivan, and the Eastern North America blackout). Since 1953, there have been a total of 1,572 major disaster declarations made by FEMA. This averages out to be 31 major disasters per year since 1953. According to Layton & Associates research, these are the most common disaster threats:
- Natural disaster (fire, water, weather)
- Computer component failure
- Virus or security related attack
- Human error (maintenance, operations)
- Sabotage (employee or external)
- Bomb threat
- Denial of service attack on network or systems
- Equipment malfunction
- Telecommunications failure
- Terrorist act
Lack of proper disaster recovery planning can threaten company technology infrastructures and the survival of businesses that rely upon them.
Effects of Disaster
The effects of a disaster have a tremendous impact on business. The following is a list of the business effects associated with disasters:
- Loss of business/customers
- Loss of credibility/goodwill
- Cash flow problems
- Degradation of service to customers
- Inability to pay staff
- Loss of production
- Loss of operational data
- Financial loss and loss of financial control
- Loss of customer account management
Companies can lose their market-share or entire business if disaster recovery is not properly addressed.
Business Continuity Management – Definitions and Model
Business continuity management should look at all critical information processing areas of a company, including but not limited to the following:
- Local and wide area networks and servers
- Telecommunications and data communication links
- PBX, telephone, and voice mail systems
- Workstations and workspaces
- Applications, software, and data
- Operating systems
- Computers and printers
- System interfaces
- Records storage
- Business processes
- Staff responsibilities
A business continuity management methodology should be followed that will provide an integrated framework for companies to implement a proper level of disaster recovery protection for their organization. The business continuity management methodology proposed provides this integrated framework and consists of the following elements: risk assessment, business impact analysis, disaster recovery, business recovery, business resumption, contingency planning, and crisis management. The following are the definitions from the business continuity glossary:
- Risk Assessment – Process of identifying the risks to an organization, assessing the critical functions necessary for an organization to continue business operations, defining the controls in place to reduce organization exposure, and evaluating the cost for such controls. Risk analysis often involves an evaluation of the probabilities of a particular event.
- Business Impact Analysis – The business impact analysis is a process designed to identify critical business functions and workflow, determine the qualitative and quantitative impacts of a disruption, and to prioritize and establish recovery time objectives.
- Disaster Recovery Plan – The management approved document that defines the resources, actions, tasks, and data required to manage the recovery effort. Usually refers to the technology recovery effort.
- Business Recovery Plan – Process of developing advance arrangements and procedures that enable an organization to respond to an event in such a manner that critical business functions continue with planned levels of interruption or essential change.
- Business Resumption – Planning to ensure the continued availability of essential business processes, programs, and operations. Business resumption planning prepares organizations to recover from contingencies, defined as any event that may interrupt an operation or affect service or program delivery. Business resumption planning includes facility and operations management, as well as information technology systems. The resources that must be considered include information, assets, people, and facilities.
- Contingency Planning – A plan used by an organization or business unit to respond to a specific systems failure or disruption of operations. A contingency plan may use any number of resources including workaround procedures, an alternate work area, a reciprocal agreement, or replacement resources.
- Crisis Management – The overall coordination of an organization’s response to a crisis, in an effective, timely manner, with the goal of avoiding or minimizing damage to the organization’s profitability, reputation, or ability to operate.
Hopefully, organizations have time and resources to complete a crisis management plan before they experience a crisis. Crisis management in the face of a current, real crisis includes identifying the real nature of a current crisis, intervening to minimize damage, and recovering from the crisis. Crisis management often includes strong focus on public relations to recover any damage to public image and assure stakeholders that recovery is underway. An uncontrolled disaster or a combination of mismanaged disasters could lead to a crisis. The magnitude of the crisis could be larger than a disaster in terms of loss expectancy. A crisis usually happens because of accumulated unattended/unresolved disasters/issue(s).
Business continuity management implies ensuring the continuity or uninterrupted provision of computer operations, business processes, and services. Business continuity management is an ongoing process with several different but complementary components. Risk assessment and business impact analysis are the initial steps.
Every organization, no matter how small, should have a business continuity management program to help ensure that they are able to recover both information technology and business operations in a timely manner. The sponsors of the program should be company senior management and the head of information technology.
Business Continuity Management Implementation
Senior management awareness is the initial – and a very important step – in creating a successful business continuity management program. To obtain necessary resources and time from required areas of the organization, senior management must understand and support the business impacts and risks.
In today’s global business environment, having the correct computer systems, databases, and information in a timely manner can be the difference between profits and losses, maintaining pace with competition, and ensuring the viability of a company. According to a June 2003 IDC study, companies that have 24x7 access to information allows them to achieve their objectives. In addition, vendors, suppliers, customers, and employees must have access to information when they need it. Providing the necessary level of information under normal conditions as well as unpredictable disruptions or catastrophic disasters is necessary for a company to survive. It is during the unpredictable disruptions or catastrophic disasters that businesses risk losing competitive advantage by not taking the appropriate measures to prevent loss of information availability.
The costs associated with developing and implementing a business continuity management program will be relative to the number of computer systems and business processes identified in the risk assessment and business impact analysis that require risk mitigating safeguards. In addition, costs associated with a vendor hot site, telecommunications, and resources need to be factored into the cost equation.
According to the same IDC study in which telephone interviews were conducted on 41 companies that integrate business continuity as part of their information technology strategy, costs by companies varied dramatically. The companies studied were in the financial services, manufacturing, and healthcare industries. The finance industry spent far more proportionally than the manufacturing and healthcare industries. The significance of these differences in business continuity budgets indicates, first, that each industry places a different level of importance on the role of business continuity, and second, that the financial industry places a higher level of significance on incorporating business continuity as part of its information technology strategy. Following are the breakdown of costs by industry:
- Financial services $500 million
- Manufacturing $50 million
- Healthcare $30 million
The 41 companies surveyed were comprised of 14 from financial, 15 from manufacturing, and 12 from healthcare. The average cost spent per company on business continuity for year 2003 was $4.5 million.
As part of the IDC study, company information was also collected on revenue lost per incident by business function. In measuring overall revenues lost per incident, the losses incurred per disaster result in an average loss of approximately $3 million per incident. The largest overall losses are incurred by back-office functions, led by finance, followed by manufacturing and human resources. Considering the study in 2003, where 60 percent of 850 mid- to large-sized companies experienced from one to 24 hours of unplanned down time, the $3 million average cost per incident is significant.
A company needs to balance the cost of unavailability with the cost of recovery. Companies need to make decisions on what systems and business processes should be part of their business continuity management program and need to be recovered in the event of a system outage or disaster. Companies need to have healthy discussions on business continuity management as part of their strategic, tactical, and operational planning process and include appropriate funding in their information technology and business unit budgets.
It is critical that companies follow a robust business continuity management methodology when performing disaster recovery planning. The plans implemented must be achievable, testable, and cost-effective in order for them to be effective. The following are methodology elements and associated deliverables:
Business Continuity Management Methodology
- Conduct risk assessment report of information technology and business process threats and vulnerabilities, ranked by high, moderate and low risk ratings. Identification of risk-mitigating safeguards is necessary.
- Conduct business impact analysis report of risks and associated business impacts. This is used for preparation of disaster and business recovery plans.
- Prepare disaster recovery plan for orderly restoration of computer system and telecommunications services.
- Prepare business recovery plan for complete recovery of business processes, including the people, workspace, non-information technology equipment, and facilities.
- Conduct business resumption workaround procedures for business planning processes (used until the processes are recovered).
- Conduct contingency planning document for how to respond to various external events.
- Prepare crisis management plan for the overall coordination of an organization’s response to a crisis to avoid or minimize damage to profitability, reputation, or ability to operate.
All seven steps should be completed to help ensure adequate disaster recovery plans are in place. All seven steps also need to be updated on an ongoing basis. If they are not sustained, improved, and when necessary changed, the investment in the business continuity management program will be wasted.
Sept. 11, 2001, changed the way many people view the world. It expanded the meaning of “disaster,” causing organizations to rethink their business continuity plans. The business continuity management methodology outlined, provides the road map that all companies (large, medium, and small) can follow to help ensure continued computer system and business operations in the event of a disaster, computer emergency, or crisis in this post 9/11 era.
Printed In Spring 2006
Dr. Edward Moskal is a professor in the computer and information sciences department at Saint Peter’s College in Jersey City, N.J. Prior to becoming a professor in 2001, Dr. Moskal worked 24 years at Fortune 100 companies, developing and directing system implementation projects on mainframe, mid-range, and client-server computing platforms. Systems included manufacturing, marketing, e-commerce, customer relationship management, enterprise resource planning, financial, human resource, retail, and shareholder. Dr. Moskal earned a bachelor’s degree in management and information systems from Saint Peter’s College, a master’s degree in administration from the University of Notre Dame, a master’s degree in management science from Stevens Institute of Technology, and a doctorate in business administration from Kennedy-Western University.
This article was made possible through the 2005 New York University Summer Scholar-In-Residence program in which Dr. Moskal was an active participant. Utilizing New York University resources (databases, journals, periodicals, text books, and the Internet), Dr. Moskal conducted research and studied the subject of disaster recovery.