|
Data
Replication is the Key to Business Continuance (Part 1)
-by
Robert A. Collar
Overview
This article is about replication of information, not just disaster
recovery, and how it can help improve business continuance for users
of Information Technology.
While many consider replication primarily a solution for disaster recovery,
the reality is that using a robust replication scheme reduces and even
eliminates outages in data availability - planned and unplanned. Reducing
or eliminating planned outages means mission-critical applications go
on generating revenue and improving a companys bottom line despite
interruptions.
By reducing a companys exposure to unplanned outages, the company
may be able to continue generating revenue even when the worst cataclysm
hits one of their data centers.
Why
replicate?
It seems obvious but bears repeating: Replication is really not an option
but a necessity. Some reasons: online backups, data migration projects,
application testing, data mining, storage consolidation, storage upgrades,
and disaster recovery - all require data replication.
Replicating allows several tasks to be performed on the second image
without affecting the original. For example, the second image may be
used as a source when performing a backup. Creating a second copy of
the data (potentially on a second system), creates an offline backup
without impacting the performance of the primary data set.
A large number of IT professionals distrust online backup. No one wants
to slow down performance during normal operation of the database just
to create a backup.
And, an applications performance requirements might have outgrown
the capability of the storage platform. To get higher performance or
to exploit higher capacity storage platforms calls for replication,
as well. Replication can ease the migration task of moving the data
from low-end islands of storage from several servers to a highly available
high performance centralized storage platform, for example.
In a third case, a live database often is needed to test
a new software version or verify a database change. Again, the case
is made for replication.
Finally, many businesses concerned with potential business interruption
replicate their data in a remote data center - across town, across state,
across the globe. These users require easy transport of data from the
primary facility to the other. In the process, they might also replicate
data from the remote site to the primary for a cross-mirroring solution.
Back
to the basics
To decide which replication scheme should be used, a user needs to decide
which method best fits the needs of the business -- server-, controller-,
or appliance-based. And, users must decide what to replicate, data or
storage.
Data
vs. storage replication
Replicating file- or record-oriented information can be considered data
replication because that is exactly what is being replicated - the data.
Replicating file- or record-oriented data typically requires a server-based
solution, as hardware-based solutions are not able to understand as
to the context of the data being written.
Data replication might be a reasonable choice if there are several different
areas or types of data or information being used on a disk or Logical
Unit Number (LUN), and only a subset of that information must or should
be replicated.
The key is deciding precisely what is to be replicated. For example,
if replicating a Unix boot disk it is recommended to not replicate the
swap area on the disk. If replicating from a NT system, do not replicate
the pagefile.sys file.
In both of these cases, these areas or files are used for the virtual
memory of the system. They are not useful on any other system and are
constantly being accessed and modified.
In another example, users might decide to replicate a databases
index, roll, and possibly the data files but not the temporary work
area.
Another option is to replicate the entire disk or LUN, making no distinction
of what is actually being replicated. This can be considered storage
replication and involves an entire disk or LUN, as there is no distinction
of the type or format of the information being replicated.
Storage replication can be performed by either a software- or hardware-based
solution. Storage replication is useful when an entire disk or LUN is
being used for a single purpose or all of the data on that disk or LUN
requires replication.
With the large datasets being used today, it is certainly not unusual
for an entire disk or LUN to be used for a single purpose. In fact,
there may be many disks or LUNs used for a single purpose or database
and typically they would all have to be replicated in order to be useful.
Mirrors
and copies
A copy of data is a stand-alone image of data at a single point in time.
Mirrors are copies of data that may be updated as changes are made to
the primary instance of the data.
Copies may be used for back-ups or data forms, where mirrors would typically
be used for disaster recovery.
Full
Mirrors
A full mirror contains a complete copy of the data that uses the same
amount of space on the secondary image as the primary.
In other words, a full mirror of a 100GB dataset would take another
100GB - for a total of 200GB. If RAID is being used for the storage,
make the calculations on the available storage and not the raw storage.
Using the same example of a 100GB dataset. If that dataset needs to
have 100GB of available storage using a RAID-1 for both the primary
and secondary of a mirror pair -400GB of raw storage is actually needed
to get the 200GB of usable storage (2 images of 100GB each, consisting
of 200GB raw storage each).
If using RAID-5 or RAID-3, naturally the multiplier for usable to raw
storage will change. To keep things simple, it is best to estimate the
amount of usable storage required and then multiply that number out
to for the RAID implementation and then double that number for a single
mirror.
Full mirrors will generally have five states: establishing, established,
suspended, terminated and re-establishing. When the mirrors are establishing,
the original data is being read from the primary image and written to
the secondary, or mirror, image.
When the mirrors are established, the images are identical and writes
are sent to both images simultaneously (at least logically).
Suspended mirrors allow for the two images to be split for a time, and
writes made to one image are tracked for use during the re-establish
process. Depending on the mirroring implementation, there may be restrictions
as to what can be suspended and what cannot, such as only the secondary
or either the primary or the secondary.
Terminated mirrors means just that: the mirror pair is broken and each
image can be separately addressed and written to; there is no longer
a relationship between the two images.
Re-establishing the mirrors is a state where the two images are resynchronized
by sending the outstanding changes to the previously suspended image.
The method of tracking the changes and how they may be written is implementation
specific. A full mirror is especially useful in disaster recovery and
business continuance models, storage migration projects, and software
upgrade testing. This is especially true when it is easy to select which
image to fall back to during the resynchronization process.
Flash
mirrors
Some mirrors or copies of data are available instantly, in a flash,
where others may not be usable until the establish or copy process has
been completed. Until the images are synchronized, writes may actually
involve two writes and a read: one read to grab the old data, one write
to send the old data to the flash mirror, and one write of the new data
to the live data image.
In addition, a background task will fill up the blanks on the mirror
image. The advantage of this is that work can be done on the flashed
image without waiting for the image to be completely copied over.
When images are hundreds of GB in size, there may be a significant time
saving.
On the other hand, there may be a significant amount of additional overhead
in supporting this model because of the additional read/write traffic.
Flash mirrors are useful when write performance is not as critical as
making the mirrors available quickly, such as using a point-in-time
copy that is regularly reset to reflect the current state every day
or so.
Sparse
copies
A sparse copy works on the theory that while it exists; it only needs
to keep track of the data that has been over-written.
These are typically point in time images that have a limited life expectancy
(such as while the data is being backed up). Instead of using an identically
sized image as what was used in the full mirror, a separate area is
used that only needs to be roughly the size of the amount of data being
written while it is in existence.
For example, a 100GB database may only write 1GB of data during the
time a backup takes place. In this example, the sparse mirror needs
to be only 1GB in size instead of the 100 GB size a full mirror would
require.
A sparse copy is created by assigning a device (or space on a device
or file system) to be used as the sparse storage repository. Any writes
after this point in time are then written to the primary image after
the old data is saved in the repository (similar in some respects to
the flash mirror).
The difference here is that data written is not saved on a block-for-block
basis as in the case of the full mirror. Instead the data writes are
sent to the sparse repository starting at the beginning of the sparse
repository, working its way towards the end. Some method is used to
track which data blocks have been written to the repository, so that
any subsequent read would know to read either from the primary or where
to look on the sparse repository (depending on the image being read
and whether the data has been modified or not).
In some implementations, there may be multiple images kept in the sparse
repository, and it depends on the implementation as to how multiple
writes to the same area are handled.
There are two problems with a sparse copy: if the primary image develops
a problem, such as a device goes off-line, all of the images become
unavailable; and if the amount of data being written exceeds the capacity
of the sparse data repository the sparse copy is no longer valid.
On the other hand, if the data is only temporary and has a limited life
expectancy, there is a significant amount of storage that can be saved
compared to a full mirror.
Summary
There are a few very critical decisions to be made at the outset to
insure that data replication succeeds and the full benefits are realized.
Each decision grows out of prior decisions. And, like the bricks in
a wall, each decision leads to a structure that meets the business needs
of the organization in the most time-, cost- and resource-saving way.
Decision then leads to implementation.*
Robert A. Collar is Senior
Product Manager for SAN Director Products at LSI Logic Storage Systems,
Inc., Milpitas, CA. He has been involved in high availability solutions
off and on since 1988, working at Tolerant System, Pyramid, and as an
IBM/HP/SGI reseller. He has been involved in Unix-based solutions since
1979. He recently addressed the SNW/Tokyo show in January 2001 on replication
and business continuance. He can be reached at rcollar@lsil.com.
*Part 2 of this article will be featured in the next issue of the DRJ.
©Copyright
2000 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
«BACK
to the Articles Index
|