Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

Logical Datacenter Replication

Written by  Ashar Aziz & Brian Korn Thursday, 22 November 2007 00:07

Disaster planning has taken on increased significance for IT professionals over the last year. Businesses have been re-addressing previous plans and questioning existing contingency solutions against new perceived threats. At the top of the list is the maintenance of a remote fail-over facility to keep businesses up and running in the event the primary datacenter fails.

The benefits of a remote datacenter are obvious and have been covered in great detail in other articles. What has not received as much attention is the manual process involved and the difficulty in keeping multiple physical environments closely synchronized. Replicating data is relatively straight-forward with a number of data replication solutions available on the market. Replicating the infrastructure around the datacenter, however, still relies on manual process from skilled IT staff. 

Production databases, for example, are often kept within storage arrays that are then replicated to a remote location for disaster recovery purposes. The supporting infrastructure is generally not captured within the replicated storage array and must be replicated through manual processes. This means that anytime there is a change to the supporting infrastructure in one location, such as a server addition or a driver change, it must then be manually replicated in any other location. This supporting infrastructure includes:

• Multiple layers of devices on multiple IP subnets
• Database, application, and Web servers with specific interdependencies
• Any individual server’s software image, applications, and patches, e.g. database version
• Configuration of appliances such as firewalls and load-balancers

Manual process limits the flexibility to react in the primary center and eventually results in the secondary site being an inaccurate replication of the primary location. This happens even to the best of plans. After 9/11, I spoke with storage vendors regarding datacenter recovery. They said that virtually all implementations of remote storage replication were successful, yet more than 25 percent of sites utilizing data replication failed to come up. This failure was not because of any failure in array replication, but rather due to the lack of appropriate operating system, application software version and patch levels.

Replicating data in real-time while replicating the surrounding infrastructure through manual process creates the distinct possibility for data and infrastructure to get out of synchronization. The result is an increased time to recovery in the event of a failure. An opportunity exists to leverage automated storage replication technologies to not only replicate the underlying data for individual servers, but also to replicate the entire state of the datacenter between multiple locations.

What’s required to make this happen? It is the capability to deploy entire server farms within a datacenter using software. This logical datacenter then provides a platform with the capability of being deployed and reconfigured without manual intervention. If software can be used to create an infrastructure then software can be used to re-create an infrastructure. This capability to replicate an entire infrastructure is obviously useful from a DR perspective.


 Within a replicated logical datacenter, the addition of servers, firewalls, load-balancers, and storage to a server farm in the primary datacenter would result in the instantaneous replication of these deployed resources within a secondary location. Configuration changes such as IP addressing, network topology, and device configuration would also be immediately replicated.

The result of implementing a logical datacenter is that the time to recovery is shortened and errors introduced through the manual replication of the surrounding infrastructure are eliminated.

Automate The Physical

The ability to automatically deploy an environment will require that the hardware be physically present and connected with a manageable interconnect to allow automated configuration into any logical structure required. In other words, it will allow IT staff to install the hardware, pre-wire the entire datacenter once, and then logically rewire the environment through control of the installed switching infrastructure between the devices. Once the datacenter has been physically installed, the entire environment can be managed remotely through software—thereby creating a lights-out environment.

Conceptualizing the capabilities of this automated environment and how this can be leveraged to replicate a datacenter requires some additional details. Categorizing the components into three distinct areas, resources, fabric, and control, helps to describe this automated environment.

Resource Layer: Contains the provisionable equipment that can be used to design and build server farms—servers, firewalls, load-balancers, and storage, for example.

Fabric Layer: Contains the switched infrastructure that connects resources into a flexible, controllable fabric. This fabric layer includes Ethernet switches, SAN switches, terminal servers, and power control.

Control Layer: Contains the servers and management software to automate and control the configuration of the datacenter. It has the ability to configure and deploy fabric and resource devices to create logical server farms.

Resources are wired into a fabric that allows them to be physically wired once, but logically rewired as needed to build a server farm topology. This capability is enabled through Ethernet VLAN configuration and SAN Zoning capabilities. A graphical user interface shields the user from infrastructure configuration details.

Datacenter managers use such a drag and drop interface to add a server to an existing server farm or build an entire server farm from scratch. Adding a server, for example, automatically places the device into the correct subnet, associates the server with the correct software image, configures details such as machine name and IP address, and ultimately powers the device on to make it available to the server farm. Firewalls, load-balancers, and storage are added, built, or removed in the same way.

Automatically Replicating A Logical Datacenter

Replicating a logical datacenter begins by placing all of the logical datacenter’s state into a storage array that supports remote replication. This storage array must contain both the data used by any replicated server and also the metadata that defines device configuration and server farm structures within the logical datacenter.

The metadata used to build server farms within a logical datacenter includes all of the server farm details such as the number of servers, firewalls, and load-balancers in a logical server farm; it also captures details such as the server farm network topology, IP addressing, storage allocations, and software images populated onto storage.
In addition to metadata, any storage allocated to servers must also be captured in the replication process. Servers with internal storage present an additional hurdle to datacenter replication, as internal storage requires the implementation of an alternative data replication process. Servers deployed in a replicated logical datacenter must be deployed diskless, and utilize a centralized storage array for all storage requirements. Diskless servers can be implemented through fibrechannel boot, network boot, or even SCSI boot with the addition of a gateway device.

The storage array, containing all data from servers and the metadata from the logical datacenter, can now be replicated to another storage array at a remote location. The data replication between arrays will serve to replicate all server farms within the primary datacenter to the storage array in the stand-by logical datacenter. Any changes made within the primary datacenter, from the addition of a new device to the writing of a log file is automatically replicated to the stand-by logical datacenter.

A stateless pool of devices will be kept available at the stand-by location. These devices will be used to deploy server farms against the replicated data and must be a superset of the hardware installed in any primary datacenter for which it serves as stand-by location.

Failure Recovery

The stand-by datacenter now includes both a stateless pool of resources and a storage array that contains all of the replicated state from the primary datacenter. In the event of the failure of the primary logical datacenter, the replicated datacenter can be brought up by binding the storage array with the available pool of resources.
In order to bring up a replica of the server farms from the primary datacenter within the fail-over datacenter a number of automated processes are started.

1. The storage array is bound to the stateless stand-by logical datacenter. This attaches the stand-by storage array to the stand-by resources. This is accomplished by automatically reconfiguring the SAN infrastructure within the stand-by datacenter.

2. Server farms deployed in the primary datacenter are allocated within the fail-over logical datacenter using the metadata within the replicated storage array. This includes building server farm topologies, adding in devices, and configuring these devices so to replicate the primary logical datacenter.

Leverage Proven Storage Array Replication To Replicate The Entire Datacenter
The deployment of a logical datacenter provides for software control of a physical environment. It decreases errors introduced by manual processes and provides the ability to react rapidly to changing demands on a datacenter without waiting for the physical movement of people and equipment.

Once software is used to control an entire infrastructure environment, the logical structures built within the environment are captured as data used by the control software. This data can be stored within a storage array that provides for data replication to a remote array. When replicated to another location, this data can be used to bring up a replicated environment from the primary location.

This replication can also be used to replicate many datacenters to a single stand-by location. In the case of many-to-one, it is possible to maintain a facility that can support any number of primary datacenters simply by replicating multiple storage arrays to this one location. The replicated storage arrays can then be used to bring up all or part of any number of failed environments.

The replicated logical datacenter provides automation to entire datacenter replication, and increases the likelihood that what is replicated from one facility to another will actually be usable when needed. The removal of manual processes from the replication of a datacenter will result in a more reliable fail-over replication and ultimately a shortened time to recovery in the event of a failure.


Ashar Aziz is the CTO and co-founder of Terraspring. Prior to that he spent 12 years at Sun Microsystems, most recently as a distinguished engineer.

Brian Korn is a senior product manager at Terraspring with a focus on storage and SAN management. He brings more than 10 years of related experience, most recently as a product manager with SGI.

Login to post comments