|
An
Internet Based Disaster Recovery System
By PIN YANG, TAO LI,
KUI ZHAO & JUN KAI LIAO
Data is a vital asset to businesses of all sizes, so it is crucial it
be protected at all times. However, most businesses are at risk of losing
or corrupting their data due to hardware and software failures or virus
intrusions. Many rely on traditional approaches such as data backup
and cluster technologies to protect data and keep it alive for a continuous
business operation. However, large-scale disasters such as floods, fires
or earthquakes may enable all of their efforts. An ideal solution would
be an offsite disaster recovery system. However, this solution requires
dedicated communication lines to transfer data between local data center
and remote backup center, a costly option that may not be viable for
all enterprises.
In this article, an Internet based real-time data backup system is proposed.
With this option, the public Internet is used instead of dedicated communication
lines to build an offsite disaster recovery system. It is low cost and
highly reliable, enabling most enterprises to use it.
This solution features the following:
- Internet-based data
backup
- Encrypted data transmission
between local and remote site
- Real-time data backup
- Automatic services switching
Architecture
The system consists of two parts physically: local data center (LDC)
and remote backup center (RBC). Figure 1 illustrates the architecture.
LDC comprises a group of servers that provide services for the (a) business
and a local disaster recovery gateway (LDRG), by which all of the servers
connected to Internet. LDRG inspects the status of the servers and controls
the Internet users’ access to the services provided by the servers
in LDC.
Figure 1. Architecture of the internet-based disaster recovery system
Similar to LDC, RDC comprises a group of backup servers and a remote
disaster recovery gateway (RDRG), but the number of backup servers can
be fewer than that of local servers. That is, one server in RBC can
act as the backup server for several servers in LDC.
The system is made up of three subsystems functionally: data backup
and recovery system (DBRS), IP tunneling system (IPTS) and services
switching system (SSS). Figure 2 illustrates these subsystems and their
relationships.
Figure 2: System modules and their relationships
DBRS consists of two modules: file monitoring module (FMM) and data
backup module (DBM). It is dedicated to real-time backup data to remote
backup servers and to recovery data from remote backup servers.
IPTS can set up a secure tunneling to transfer data between LDC and
RBC. The channel provided by IPTS is strong encrypted and can ensure
the confidentiality of the data transmission.
SSS consists of a server monitoring module (SMM) and a service switching
module (SSM). It aims at monitoring the status of the servers dynamically
and can switch service to the remote servers automatically when local
server fails.
Under normal conditions, users can access the local servers through
LDRG, and at the same time FMM monitors the state of the file systems.
Once changes of the file system are detected, a notification is sent
to DBM, which then uses a differential-backup method to backup the changed
data to the remote backup server through a secure IP tunneling provided
by IPTS.
When a server in LDC fails, SMM can detect this state and send a notification
to SSM to relay the user’s requests to remote backup servers that,
then, can take over the service from the failed server. When SMM detects
the recovery of the failed server, it notifies DBM to recover the lost
data on local server from the backup server. When the recovery process
finishes, SSM switches service from the backup server back to local
server to continue the service for users. Figure 3 illustrates this
process.
Figure 3.Flowchart of service switching and data recovery
Key Techniques
Real-time data backup
When any changes of the data are detected, the changed data are transferred
from the local server to the backup server so that data loss can be
minimized. RDBS is implemented by two modules: FMM and DBM. FMM is responsible
for monitoring the state of files. Once any changes occur, it sends
a notification to DBM to backup the concerned files.
FMM
FMM, implemented in Linux, aims at monitoring the file system status.
In our implementation, it is called by DBM to collect real-time information
about the changes of file systems. Figure 4 illustrates the process
flowchart.
After the administrator configures the directories and files to be backed
up and starts DBM, FMM is called with the directories and files monitored
as parameters. When any changes are detected, FMM will notify DBM of
these changes. These changes include file creation, deletion, modification,
renaming and the changes of file attributes that include file size,
type, user, group, time of access, time of modification, header node
and blocks assigned. So, DBM can backup the changed files or directories
to RDC.
When the file system monitoring is temporarily not needed by any application,
FMM can be stopped and, when needed, resumed.
Figure 4. Flowchart of FMM
DBM
DBM is responsible for data backup and recovery. For a more effective
backup, a differential backup strategy is applied. This transfers only
changed data instead of whole data by comparing files. An advantage
is that it minimizes the amount of data to be transferred and makes
better use of system resources.
The algorithm of differential backup is described as below:
Suppose that there is a file named fa in host A, and a file named fb
in host B. They are similar. In order to synchronize them, the steps
taken are shown as:
1) Divide file fb into a series of data blocks sized S, and the size
of the last block can be smaller than S;
2) Calculate a weak 32-bits rolling checksum and a strong 128-bits MD4
checksum for each block;
3) Send these checksums to host A;
4) As in step (1) and (2), Host A computes the weak 32-bits rolling
checksum and the strong 128-bits MD4 checksum for each block of file
fa, and compares the checksum with that of the counterpart of file fb.
If they are matched, the data in the block is the same; otherwise the
data is different;
5) With the comparison, a copy of file fa created in Host A is sent
to Host B. A reference is included if the data of a block is the same;
otherwise the real data is included.
By this algorithm, file fa and file fb are synchronized and only those
parts different from those of file fa are modified.
In order to increase the speed and performance of file checking, the
rolling checksum algorithm is applied in the differential backup algorithm.
It can produce a checksum from any position in a file quickly and check
the checksum in a rolling way. The checksum is called rolling checksum
because it switches to the next byte to check the new checksum after
it finishes checking one checksum until the end of file is encountered.
The rolling checksum algorithm is described as below:
1) s(k,l) to calculate the checksum from byte k to byte l and
for simplicity and efficiency, M=216
2) Use the result of (1) to calculate the new checksum from byte k+1
to l+1
a(k+1,l+1)=(a(k,l)-Xk+Xl+1) mod M
b(k+1,l+1)=(b(k,l)-(l-k+1)Xk+a(k+1,l+1)) mod M
s(k+1,l+1)=a(k+1,l+1)+216b(k+1,l+1)
3.2 Encrypted data transport in IP tunneling
IP tunneling based on IPsec is employed to keep the confidentiality
of data transferred between LDC and RBC.
IP tunneling provides an end-to-end encrypted channel, and therefore
an IP tunneling module is configured in both LDRG and RDRG. When LDC
communicates with RBC, LDRG and RDRG set up a secure communication channel
by negotiation. The data is encrypted before transferring in the LDRG
and decrypted when received in the RDRG and sent to the backup server.
All the data transferred on the Internet are cipher-text, so that even
when intercepted by hackers, they cannot be understood.
The process of data encryption, decryption and transferring is shown
as Figure 5.
Figure 5. Process of data encryption, decryption and transferring
Service Switching

Service switching technology includes status monitoring and service
switching technologies. It can automatically switch service to remote
backup server to provide service when any failure of the local server
is detected. On the other hand, when the local server recovers from
the failure, it can switch the service back to provide service to users.
For users, however, the service switching is transparent.
SMM
SMM is implemented as a daemon. It regularly queries the status of server.
A received response indicates that the server is active, or otherwise
the server fails and a service switching is applied. Figure 6 illustrates
the process.
Figure 6. Flowchart of server status monitoring
IPTable in Linux is deployed to implement SSM. IPTable, a packet-filtering
firewall tool in Linux can implement state packet filtering, network
address translation (NAT), flow control and load balance. Destination
NAT (DNAT) is used to implement service switching. When LDRG detects
the local server failure, a DNAT rule is added to NAT rules chain to
redirect to RBC the requests originally sent to LDC, so that the remote
backup server can provide service instead of the server. When the local
server recovers from failure and data has also been recovered, the DNAT
rule added earlier will be deleted and the service is switched back
to the local server.
Conclusion
An Internet based real-time backup system allows the Internet to transfer
data between LDC and RBC without dedicated lines, so the distance between
LDC and RBC is not limited and the cost is lower than that of the solutions
based on dedicated lines. IP tunneling ensures the confidentiality of
data transmission via the Internet. The automatic data backup and recovery
technologies can minimize the data loss, while the service switching
technology enables a continuous business operation against disasters
such as floods, fire and even earthquakes. It is an inexpensive and
secure disaster recovery solution for small or medium-sized businesses.
Pin Yang is associate professor at the School of Computer Science
and Engineering of SiChuan University. E-mail: yangpin@cs.scu.edu.cn.
Tao Li received B.S. and M.S. degrees in computer software from University
of Elementary Science and Technology of China (UESTC) in 1986 and 1991
respectively, and a Ph.D degree in circuit and system from UESTC in
1995. From 1993 to 1994 he was a visiting scholar at University of California
at Berkeley. He is currently a professor of computer department at SiChuan
University. E-mail: litao@scu.edu.cn.
Kui Zhao, assistant researcher of computer department at SiChuan University
(China), has a background in research, computer networks, information
security, Internet, database, business continuity, and disaster-recovery
product development. E-mail: zhkui2k@sohu.com.
Jun Kai Liao
©Copyright
2004 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
«BACK
to the Articles Index
|