A roadmap for becoming a stronger, more responsive business
Resilience needs are not the same across all industries or even across companies within a given industry. As a result, the process of becoming a resilient business is highly individualized, but the complexity of the task demands a methodical approach. The business resilience transformation lifecycle actually maps the transformation journey in a stepby- step process designed to assist in appropriately restructuring all objects within an enterprise, mitigating risk, and enhancing the ability to exploit opportunities.
Phase one: Determine Risk Exposure
The transformation lifecycle begins when risks unique to an organization are identified. These may include the risk of natural disasters and could also include civil unrest, technical failures, regulatory compliance, and sudden changes in demand, operational requirements, and any other risks that may interrupt normal business activity. It is important to include opportunities in the assessment such as sudden spikes in transaction volumes, new acquisitions or mergers, or highly effective marketing campaigns.
Companies tend to perform these types of assessments infrequently, in response to regulatory requirements or other changes to the business model. However, many organizations are now appreciating the value of moving to a more structured, scheduled approach to analyzing risk profiles in the face of rapid global changes in business. In any risk analysis, the following steps are critical:
- Rank threats based upon past occurrences, the amount of potential revenue loss, damage to the brand, compliance risks, and single points of failure
- Prioritize safeguards
- Conduct a cost-benefit analysis if performing a quantitative risk assessment
- Determine next steps based upon the severity of the threat, the selected safeguards, and the cost and ease of implementing those safeguards
Phase Two: Rank the Risks According to Potential Business Impact
The second step is to rank the identified risks according to how they will likely affect a business. For example, identifying and prioritizing business services, functions, or processes according to how finances would likely be affected if the risks to these areas were realized.
It’s important to go beyond a simple business-impact analysis, which can make every part of the business seem critical and requiring every object to be resilient. Most businesses have just a few truly key functions or processes, but resources can be targeted more effectively by understanding what the exact requirements are in those areas. The process of analyzing the business impact of the risks a business faces should include the following steps:
- Identify all critical business functions and processes
- Link business processes to the applications and data that support them
- Establish appropriate availability and recovery strategies by ranking each process in terms of the length of time it can operate without its supporting infrastructure
- Establish appropriate security levels by classifying data by importance to the business
- Identify critical physical recovery resources and vital records, as well as the timeframe within which they must be available for recovery efforts.
Enterprises typically consider business impact in light of two measurements: recovery time objectives (RTOs) and recovery point objectives (RPOs). An RTO specifies the amount of downtime a business can tolerate, and an RPO specifies the amount of unrecoverable transactions or data that the company can withstand. In some industries, such as financial services or credit card processing, downtime can cost millions of U.S. dollars per hour.
For most companies, acceptable RTOs and RPOs are quickly approaching zero, but it is still important to analyze the cost of downtime to devise a targeted, costeffective strategy for minimizing or eliminating the downtime.
Phase Three: Evaluate Your Resilience Capabilities
The next step is to perform a gap analysis of needs and capabilities. To help reduce the time and resources needed to complete this assessment and focus on areas that may need a more stringent analysis, it’s helpful to break this phase into two steps. The first step entails performing a high-level review of a company’s ability to meet the basic requirements of resilience including:
- Maintaining continuous business operations
- Achieving regulatory compliance and meeting industry standards more quickly and costeffectively
- Integrating risk strategies to optimize resources
- Providing data protection, privacy, and security
- Obtaining the knowledge and skills necessary to achieve and maintain resilience
- Maintaining marketplace readiness
Such a review allows businesses to focus on the areas of most concern, and an analysis can help produce an assessment of a company’s object-oriented framework’s maturity level. The following maturity levels can be used to assess an object:
- Basic – These capabilities range from physical and systems security to awareness programs regarding company policies and emergency procedures, like communications, privacy, governance, and compliance programs. They may also include comprehensive continuity planning and are the backbone of an ad hoc approach to mitigating risks as they arise.
- Managed – These capabilities focus on process and policy compliance and the fundamental automation tools necessary to manage a disruption or opportunity when it occurs. In this case, management plays a strong role to ensure that employees understand their responsibilities and follow policies.
- Proactive Detection – These capabilities are centered on establishing thresholds and advanced warning systems that allow the company to take preemptive actions to help prevent disruption. The ability to monitor current performance and determine out-of-bounds conditions and behaviors for specific components is critical. The company still manually mitigates the risks as they are identified.
- Adaptive – These capabilities focus on the organization’s ability to sense and respond to unforeseen circumstances by using contingency plans and resources to maintain operations. Responses to situations must be defined in advance, but the system has the ability to adapt automatically to prevent a loss to the business.
- Autonomic – These capabilities focus on the business model itself and leverage the innovation, optimization, and capacity management characteristics that respond dynamically to changes in the marketplace, which can help a business anticipate and exploit opportunities faster than competitors. The idea is to actively foster business growth through a resilient business and IT infrastructure, rather than merely reacting to threats.
Phase Four: Design a Resilience Strategy
The next step is to incorporate a business’ view of the maturity in existing objects into a design for a resilient architecture. Attributes can be adjusted for any object in order to improve its capabilities and overall maturity level. However, it can be dangerous to undertake this without a comprehensive plan. If a company overengineers its business resilience architecture, it could spend scarce resources increasing the maturity levels of some objects unnecessarily. Conversely, underengineering the architecture could leave the organization at risk – and perhaps worse, leave you with a false sense of security.
A desired level of maturity for each object must be determined, and an analysis of the gaps between current and desired states must be established. Some changes will not be IT-related and may require business service, function, or process adjustments. In fact, a resilience-oriented architecture could easily fail if the needs of business and IT are addressed separately.
Aligning the two must be part of the process from the beginning. As a first step, a conceptual design of the new architecture must be created that aligns business and IT objectives in the following areas:
- Confirm resilience objectives
- Analyze the interdependencies of objects
- Develop guidelines and principles
- Confirm current configurations on systems, networks, databases, storage and applications
- Document and design the solution for the baseline infrastructure
- Create a preliminary investment analysis including optimization of resources that may occur by taking a holistic view of resilience instead of piece parts
After the business and IT sides of a business create and agree to the conceptual design, a solution design must be created that can be a guide through the following steps:
- Develop a detailed architecture for business resilience including evaluation of different in-house technologies verses outsourced services for cost optimization
- Define resilience strategies for systems, networks, applications, and data
- Build the design specifications l Create functional descriptions for the solutions
- Define test requirements l Build a roadmap for implementation
- Finalize the investment analysis Phase
Five: Develop Resilience Plans and Procedures
The architecture provides the structure for improving business resilience, but plans and procedures for managing and maintaining it are necessary. Such a plan should include an initial implementation strategy as well as alternatives that allow for changing business conditions. Each plan and/or procedure should be defined with respect to:
- Its benefits and limitations
- The dependencies among business services, functions, or processes
- The characteristics of the alternative strategies, such as recovery times, acceptable annual minutes or hours of outage, or security level
- The high-level cost model for the selected strategy, with recommendations for implementing technologies, processes, tools, and staffing – including critical-path items such as technology delivery times, business process reengineering requirements, and organizational considerations
- A high-level implementation plan that delineates key tasks and milestones for the selected strategy.
Phase Six: Implement the Plan
Once the implementation plan has been agreed upon, the new architecture can be deployed, and the ongoing resilience program can be structured. The implementation plan must include the following elements:
- Workload division
- Hardware alignment and provisioning
- Storage strategy l Replication strategy
- Recovery and availability strategy
- Network connectivity and capacity measures
- Shared services and infrastructure components for base operational capabilities
- Virtualization alternatives
- Systems management mechanisms
- Command and control mechanisms
- Testing capabilities
- Physical and logical security features
Phase Seven: Validate the Plans, Procedures, and Architecture
The next step is to validate the work that has been completed in the transformation process to confirm that all aspects of the architecture have been implemented properly and are working effectively to mitigate the risks identified earlier in the process. Any validation has three parts:
- Develop a resilience exercise for the architecture
- Verify the resilience requirements, objectives, scope and timelines
- Identify the resource requirements and planning tasks
- Review the processes and procedures and assess the capability of each to perform the resilience exercise
- Review the technical resilience procedures
- Review the resilience procedures and record your feedback
- Execute the resilience exercise
- Provide audit and exercise observations with documented results
- Execute the resilience exercise, including the technical recovery procedures
- Provide ongoing semiannual audits of resilience plans with documentation of actions to be taken
Phase Eight: Ongoing Management of Your Resilience Program
It’s important to remember that a resilience exercise is not a static event but is actually part of a concerted management program designed to help maintain continual monitoring, testing, and improvement of the infrastructure. A defined business resilience program should be managed so that everyone involved understands and adheres to the resilience principles that underlie the architecture. That architecture ultimately must allow for the following:
- Overall management of a total enterprisewide business resilience program
- Communication of program results to the management team
- A linkage between business executives and IT-related resources
- Thought leadership for future availability, continuity, and security initiatives
- A blueprint for the establishment and execution of a governance process
- Coordination and direction of continuityrelated staff among entities, such as the IT organization, consultants, and outsourcing providers or partners
- Ownership and direction of all aspects of disaster recovery exercises l Management of the financial plan with assigned responsibility for the costs of the program
- Coordination of third-party relationships that satisfy the needs of the continuity program l Annual review sessions to ensure alignment between business and IT objectives
- Qualification of new projects and engagement of solution design and delivery
- Communication of strategies, directions, and requirements to all relevant employees
- Definition of a single point of contact to manage resolution of resilience issues
- A change management process to incorporate business and infrastructure changes into the resilience strategy and plan
As companies look to optimize their business and IT architectures in order to save costs in the current economic climate, it becomes even more important that we work to minimize risks introduced during the optimization process. A clear, concise process using a framework such as the one described above allows companies to view their risk posture before and after optimization. If done properly, a company can both improve their business resilience and save costs. The key is using the right framework and knowing how to use it. By performing the steps in each phase, a business can have a comprehensive, but carefully targeted, resilience program that is designed to address every company’s unique needs and goals.Richard Cocchiara is an IBM distinguished engineer and the chief technology officer for Business Continuity and Resiliency Services in IBM Global Services, specializing in helping customers drive higher business resiliency in order to realize increased business availability.