Application-based replication means that the replication process is being performed by a separate task running on the server. It uses the operating system to provide access to the facilities required. It is generally either tightly coupled with the application data being replicated or is a stand-alone utility that can replicate the raw data being used by the application databases
Nearly all of the database applications provide some form of copy function that can be used to replicate data. In fact, the majority have more than one. There are functions to copy the entire database to a separate dataset, and functions to copy only a selected set of records to another dataset. In either case, the copy may reside at either the same or a remote location.
One advantage is that the application can be running while the data is being migrated. The downside is that there is usually a significant impact on overall performance.
In a recent SRC report, 21% of the backups are performed using closed databases because the IT departments don’t trust them or have data integrity concerns about using an open database during the backup process.
By backing up a separate instance of the database that has been copied from the live image and then closed, the IT department can complete a full backup without bringing the database offline and potentially without impacting the application’s performance and without the integrity concerns of an open database.
Backup utilities can be used to migrate data from one location to another. These are typically very efficient utilities, but they require the application to be closed.
This may be because of the way the utility was designed or because of the previously noted concern regarding data integrity and lack of trust in an open database.
Backing up a closed database does not have the integrity issues, but it does have a problem with taking the application offline during the backup process and the ever-decreasing time window allowed for those backups.
Specialized applications can be used to mitigate the performance impact or address some another problem with other application-level replication processes.
The problem with these customized applications is that they are a major support challenge when migrating from one environment to another.
Supporting and maintaining these applications for every software or hardware change made on the target system may become more of a support problem than worthwhile over the long run.
Operating System (OS)-based Replication Utilities
Each operating system has some method of backing up data and then restoring it.
They may be relatively simple, but when a company is looking for a simple way to replicate data
- the OS-included utility may be powerful enough and will not cost extra.
The problem here is the development and support time to use such a utility (which basically puts this into the “specialized” application category), the support time to continue making changes, and the panic time when a new release of the OS comes out and the utility changes behavior.
Several filesystems support mirroring to at least some level. Some do not do this natively, but there are a few filesystem add-on products that can provide this service running on almost every known server platform.
Some replicating filesystems may also support remote replication, but this is usually out of the normal scope of their uses.
Certainly they may effectively perform replication for the purpose of data migration or having a separate instance of a database for other uses, such as data mining or application testing.
The major advantage of filesystem replication is that data can be mirrored onto dissimilar devices and device layouts. For example, the original filesystem may have resided on a single disk or partition and the copy may be across several devices to gain a performance improvement when shifting to the new image.
The major disadvantage is the additional bandwidth requirements between the server and the storage, and the additional system resources needed to provide the replication services - which translates to application performance impact.
This kind of replication handles the data in almost the same way a hardware solution would, in that it replicates the data at the block level.
Here the data typically must have the same layout on the two devices being used, but there should be less system overhead involved in the replication process.
Depending on the operating system there may be several ways this driver may be utilized.
Generally, the driver would sit above the actual device driver and appear as a single device.
Different attributes may be given that device to tell it where the mirror or copy is and may be used to manage the devices.
Server-Based Replication - Pros
- There are many solutions in this area - some with a direct cost of nothing and some with a direct cost of tens of thousands of dollars.
- Flexibility in the type of storage platform is one of the major attractors in this type of solution. It allows the migration of data from one storage platform to another with minimal problems.
- Smaller configurations (where there are only a few systems or applications requiring replication) may consider this a viable option. As the number of systems or applications increases, the complexity of maintaining this kind of solution can become a significant task and other methods may need to be considered at that point.
Server-Based Replication - Cons
- Not all server-based replication methods and solutions are available to all operating systems. If the possibility exists that the operating system platform may change in the future, care should be taken to ensure the solution being used will run on that other platform. If designing a solution in-house, consider the impact when operating systems and applications change and the replication solution requires support or modification for the new environment. The support cost to modify established procedures when moving to another server platform may actually prevent some IT departments from migrating to another server platform, even if it makes sense in every other respect.
- Replication takes up resources. Nothing comes for free. A server-based replication solution will take up server CPU time that may affect the overall performance of the system. In addition, it will consume memory resources. Lastly, additional bandwidth is required to send the data to more than one device (this seems obvious, but is not considered nearly often enough). To attain the required application throughput, additional device connections may be required. The additional connections may require a larger server to handle those connections, even if no additional CPU power is needed. The additional connections and server size translates to additional dollars being spent for this type of solution, and should be considered part of the overall cost of the solution.
- When only a few systems or replications are needed, it is easy to manage this task on a per-instance basis. As the complexity increases, the amount of effort required to maintain this type of solution also increases. Automation and scripting can reduce this, but needs to be considered when implementing this kind of solution. When multiple operating systems and releases become involved this can be quite a challenge.
- It is important to properly evaluate the various solutions and decide if one of these solutions will satisfy the requirements of the end user.
There are two hardware-based replication methods available today - controller- and appliance-based. As their names imply, they provide the replication functionality either on the array controller or on an appliance.
Controller-based solutions use additional firmware on the controller to provide the replication services.
An appliance is an external piece of hardware that provides the replication services - in certain respects, a switch could be considered an appliance without much smarts.
Hardware-based solutions have at least one thing in common: they are server and operating system independent. Since the replication services are not on the server, there is no system overhead involved - which means applications performance will not be adversely affected.
In addition, the replication is occurring at a lower level than before and the additional connections and bandwidth requirements have been put at a lower level within the storage layout.*
Robert A. Collar is Senior Product Manager for SAN Director Products at LSI Logic Storage Systems, Inc., Milpitas, CA. He has been involved in high availability solutions off and on since 1988, working at Tolerant System, Pyramid, and as an IBM/HP/SGI reseller. He has been involved in Unix-based solutions since 1979. He recently addressed the SNW/Tokyo show in January 2001 on replication and business continuance. He can be reached at firstname.lastname@example.org.