DRJ Fall 2019

Conference & Exhibit

Attend The #1 BC/DR Event!

Fall Journal

Volume 32, Issue 3

Full Contents Now Available!

DRJ Blogs

DRJ | The premiere resource for business continuity and disaster recovery

Resiliency in a Hybrid Platform, Multi-Cloud World

Resiliency in a Hybrid Platform, Multi-Cloud World

Companies today have already or are considering incorporating Cloud capabilities into their current enterprise architectures. Starting with an existing on-premise design that utilizes traditional processing methods and cross platform technologies, many firms are expanding these footprints to include local private and remote public cloud services. This presents a challenge when considering how to adequately implement an end to end resiliency strategy to ensure continuous protection while retaining high levels of availability to achieve business expectations.

Key to addressing this challenge is determining how the actual Hybrid Platform, Multi-Cloud design will be delivered, and what impact it will have on existing resiliency programs (Hybrid Platform, Multi-Cloud being defined as a Hybrid IT design leveraging numerous platforms that can be combined with Multiple Cloud infrastructures for additional capacity and scalability). Areas of focus that need to be taken into consideration and closely coordinated include technology implications, application level recovery, multi-site network connectivity options, and the use of a single orchestration engine to continuously monitor, manage, and deliver the execution and maintenance of the program.

The evolving landscape for a Hybrid Platform, Multi-Cloud design most often starts with a view of the existing technology and infrastructure being used to deliver IT and Business services. For most Enterprise clients this begins with somewhat of a closed ecosystem, whereby processing is contained to a single site or campus using more traditional technologies such as mainframe and/or midrange computing platforms that communicate locally with INTEL based server capacities in a physical or virtual state.

These designs offer a fairly straight forward approach when considering resiliency as it is a one to one relationship relative to production being protected by a single recovery site. The critical task at hand involves synchronizing the cross-platform technologies relative to the timely restoration and synchronization of applications with consistent point in time recovery with minimal data loss.  

A starting point for determining a strategy requires a detailed investigation as to what the underlying infrastructure and technology is comprised of to enable the baseline environment to be assessed relative to what can be utilized to accommodate a migration to a future Hybrid Platform, Multi-Cloud platform.   This encompasses the need to identify which workloads (most often tied to architecture) can readily be migrated to cloud, what requires some level of enhancement prior to migration, and what can’t be migrated due to critical dependencies. Once the specifics have been identified and documented the task of understanding the workloads and their corresponding dependencies begins.

It is at this point where a completed application dependency and infrastructure mapping must take place to determine which workloads are candidates for movement to a cloud format, and where their associated data resides. Specifically, application latency and data consistency must be evaluated with respect to how a move might impact accessibility, performance, and reliability. These are critical attributes from a resiliency perspective when considering meeting business objectives for time to recover as well as for data accuracy and availability.

Once the application dependency analysis and supporting IT infrastructure baselines have been established the selection of the right strategy for modernizing or migrating existing applications, leveraging existing applications to integrate cloud capabilities, or enabling all new cloud applications to support required workloads can begin. Each of these tactical steps in implementing the strategy would present different challenges relative to where and how things would be protected and eventually recovered.

If using a modernizing or migration approach, existing applications could be updated to conform to evolving standards or lifted and shifted to cloud infrastructures to enhance flexibility and reduce costs. These applications could be local in design (Private Cloud) or be ported to an external site for processing (Public Cloud). In either case the result would yield a separate infrastructure that would need to be included in the resiliency program as to where the recovery would be placed, how it would be connected, and how it would be orchestrated.

Leveraging existing infrastructures is another approach that would enable a move towards cloud while still retaining required interaction with both existing footprints and positioning future cloud-based capability. The addition of new workloads could also follow this design and continue to leverage on-premise resource to improve return on investment while optimizing the production environment using cloud-based toolsets. A combination of integrating local IT with remote cloud processing would need a redesign of the resiliency strategy to ensure consistency and business protection across the enterprise.

For newly created business functions and Software as a Service packages, multi-cloud services where applications are born on the cloud and subsequently recovered in the cloud can be utilized to increase overall agility and offer the potential for greater innovation. These off-premise applications must satisfy the requirements for running remotely from core services and Enterprise IT as both latency and data consistency cannot be impacted by distance being introduced by off-premise processing. This easily translates when considering resiliency as the applications are effectively using a split processing model for production that can be replicated during a disaster event.      

Traditional processing services (on-premise) make it fairly simple as to how to connect the primary production site to the DR site, replicate the required systems and data, and connect end users during recovery events. Direct connections (dedicated replication bandwidth) and IP-connected circuits (MPLS for example for user access) most often times are used to provide adequate connectivity for disaster recovery processing.

Now, with the introduction of numerous sites to accommodate a combination of on-premise and off-premise workloads, the need for a more flexible and efficient network design is required. The introduction of Software Defined Networks (SDN) is one technology that is being used to link multiple sites for data transfer and enable user connectivity. This technology can drastically simplify the network management and operation in response to the increased number of connections being added to the core processing facility. Benefits include the automation of failover to reduce downtime, along with the ability to provide multiple backup paths for redundancy to increase overall business resiliency.

The net result of a Hybrid Platform, Multi-Cloud world is a highly complex IT environment with multiple tools and interfaces performing similar but separate tasks to automate, orchestrate, operate and manage the new environment. The following graphic depicts one view of these complexities with production depicted across the various on-premise and off-premise environments using both core and cloud services:

HYBRID CLOUD for Blog v1.1

The final component required to continuously monitor, manage, and report on the resiliency program involves more sophisticated automation and orchestration capabilities. Traditional disaster recovery execution has traditionally used a combined manual and automated run book to drive recovery. While this may have been sufficient in a dual site model, the complexities being introduced by Hybrid Platform, Multi-Cloud design with numerous end points requires a more comprehensive approach that can leverage orchestration to not only execute a recovery, but of equal importance provide a single dashboard for use in determining the health of the program across all sites, at all times.      

As companies continue to evolve their current Enterprise IT environments to include both on-premise Private and off-premise Public Clouds, the level of complexity and coordination required to provide acceptable levels of Business Resiliency will continue to challenge existing capabilities. This will ultimately result in a new way to approach resiliency, leveraging evolving technologies with the continued development of new methodologies to keep pace with the eventual move toward a more complete end to end design for complete business protection.      

Joe Starzyk is a Senior Business Development Executive with IBM Business Resiliency Services and a Member of the IBM Academy of Technology with over 38 years of experience in the Business Continuity and Disaster Recovery industry.

5 Benefits of Cloud DR Solutions
Making the Connection between Business and Busines...