The system consists of two parts physically: local data center (LDC) and remote backup center (RBC). Figure 1 illustrates the architecture.
LDC comprises a group of servers that provide services for the (a) business and a local disaster recovery gateway (LDRG), by which all of the servers connected to Internet. LDRG inspects the status of the servers and controls the Internet users’ access to the services provided by the servers in LDC.
Figure 1. Architecture of the internet-based disaster recovery system
Similar to LDC, RDC comprises a group of backup servers and a remote disaster recovery gateway (RDRG), but the number of backup servers can be fewer than that of local servers. That is, one server in RBC can act as the backup server for several servers in LDC.
The system is made up of three subsystems functionally: data backup and recovery system (DBRS), IP tunneling system (IPTS) and services switching system (SSS). Figure 2 illustrates these subsystems and their relationships.
Figure 2: System modules and their relationships
DBRS consists of two modules: file monitoring module (FMM) and data backup module (DBM). It is dedicated to real-time backup data to remote backup servers and to recovery data from remote backup servers.
IPTS can set up a secure tunneling to transfer data between LDC and RBC. The channel provided by IPTS is strong encrypted and can ensure the confidentiality of the data transmission.
SSS consists of a server monitoring module (SMM) and a service switching module (SSM). It aims at monitoring the status of the servers dynamically and can switch service to the remote servers automatically when local server fails.
Under normal conditions, users can access the local servers through LDRG, and at the same time FMM monitors the state of the file systems. Once changes of the file system are detected, a notification is sent to DBM, which then uses a differential-backup method to backup the changed data to the remote backup server through a secure IP tunneling provided by IPTS.
When a server in LDC fails, SMM can detect this state and send a notification to SSM to relay the user’s requests to remote backup servers that, then, can take over the service from the failed server. When SMM detects the recovery of the failed server, it notifies DBM to recover the lost data on local server from the backup server. When the recovery process finishes, SSM switches service from the backup server back to local server to continue the service for users. Figure 3 illustrates this process.
Real-time data backup
When any changes of the data are detected, the changed data are transferred from the local server to the backup server so that data loss can be minimized. RDBS is implemented by two modules: FMM and DBM. FMM is responsible for monitoring the state of files. Once any changes occur, it sends a notification to DBM to backup the concerned files.
FMM, implemented in Linux, aims at monitoring the file system status. In our implementation, it is called by DBM to collect real-time information about the changes of file systems. Figure 4 illustrates the process flowchart.
After the administrator configures the directories and files to be backed up and starts DBM, FMM is called with the directories and files monitored as parameters. When any changes are detected, FMM will notify DBM of these changes. These changes include file creation, deletion, modification, renaming and the changes of file attributes that include file size, type, user, group, time of access, time of modification, header node and blocks assigned. So, DBM can backup the changed files or directories to RDC.
When the file system monitoring is temporarily not needed by any application, FMM can be stopped and, when needed, resumed.
Figure 4. Flowchart of FMM
DBM is responsible for data backup and recovery. For a more effective backup, a differential backup strategy is applied. This transfers only changed data instead of whole data by comparing files. An advantage is that it minimizes the amount of data to be transferred and makes better use of system resources.
The algorithm of differential backup is described as below:
Suppose that there is a file named fa in host A, and a file named fb in host B. They are similar. In order to synchronize them, the steps taken are shown as:
1) Divide file fb into a series of data blocks sized S, and the size of the last block can be smaller than S;
2) Calculate a weak 32-bits rolling checksum and a strong 128-bits MD4 checksum for each block;
3) Send these checksums to host A;
4) As in step (1) and (2), Host A computes the weak 32-bits rolling checksum and the strong 128-bits MD4 checksum for each block of file fa, and compares the checksum with that of the counterpart of file fb. If they are matched, the data in the block is the same; otherwise the data is different;
5) With the comparison, a copy of file fa created in Host A is sent to Host B. A reference is included if the data of a block is the same; otherwise the real data is included.
By this algorithm, file fa and file fb are synchronized and only those parts different from those of file fa are modified.
In order to increase the speed and performance of file checking, the rolling checksum algorithm is applied in the differential backup algorithm. It can produce a checksum from any position in a file quickly and check the checksum in a rolling way. The checksum is called rolling checksum because it switches to the next byte to check the new checksum after it finishes checking one checksum until the end of file is encountered. The rolling checksum algorithm is described as below:
1) s(k,l) to calculate the checksum from byte k to byte l and
for simplicity and efficiency, M=216
2) Use the result of (1) to calculate the new checksum from byte k+1 to l+1
a(k+1,l+1)=(a(k,l)-Xk+Xl+1) mod M
b(k+1,l+1)=(b(k,l)-(l-k+1)Xk+a(k+1,l+1)) mod M
3.2 Encrypted data transport in IP tunneling
IP tunneling based on IPsec is employed to keep the confidentiality of data transferred between LDC and RBC.
IP tunneling provides an end-to-end encrypted channel, and therefore an IP tunneling module is configured in both LDRG and RDRG. When LDC communicates with RBC, LDRG and RDRG set up a secure communication channel by negotiation. The data is encrypted before transferring in the LDRG and decrypted when received in the RDRG and sent to the backup server. All the data transferred on the Internet are cipher-text, so that even when intercepted by hackers, they cannot be understood.
The process of data encryption, decryption and transferring is shown as Figure 5.
Figure 5. Process of data encryption, decryption and transferring
Service switching technology includes status monitoring and service switching technologies. It can automatically switch service to remote backup server to provide service when any failure of the local server is detected. On the other hand, when the local server recovers from the failure, it can switch the service back to provide service to users. For users, however, the service switching is transparent.
SMM is implemented as a daemon. It regularly queries the status of server. A received response indicates that the server is active, or otherwise the server fails and a service switching is applied. Figure 6 illustrates the process.
Figure 6. Flowchart of server status monitoring
IPTable in Linux is deployed to implement SSM. IPTable, a packet-filtering firewall tool in Linux can implement state packet filtering, network address translation (NAT), flow control and load balance. Destination NAT (DNAT) is used to implement service switching. When LDRG detects the local server failure, a DNAT rule is added to NAT rules chain to redirect to RBC the requests originally sent to LDC, so that the remote backup server can provide service instead of the server. When the local server recovers from failure and data has also been recovered, the DNAT rule added earlier will be deleted and the service is switched back to the local server.
An Internet based real-time backup system allows the Internet to transfer data between LDC and RBC without dedicated lines, so the distance between LDC and RBC is not limited and the cost is lower than that of the solutions based on dedicated lines. IP tunneling ensures the confidentiality of data transmission via the Internet. The automatic data backup and recovery technologies can minimize the data loss, while the service switching technology enables a continuous business operation against disasters such as floods, fire and even earthquakes. It is an inexpensive and secure disaster recovery solution for small or medium-sized businesses.
Pin Yang is associate professor at the School of Computer Science and Engineering of SiChuan University. E-mail: firstname.lastname@example.org.
Tao Li received B.S. and M.S. degrees in computer software from University of Elementary Science and Technology of China (UESTC) in 1986 and 1991 respectively, and a Ph.D degree in circuit and system from UESTC in 1995. From 1993 to 1994 he was a visiting scholar at University of California at Berkeley. He is currently a professor of computer department at SiChuan University. E-mail: email@example.com.
Kui Zhao, assistant researcher of computer department at SiChuan University (China), has a background in research, computer networks, information security, Internet, database, business continuity, and disaster-recovery product development. E-mail: firstname.lastname@example.org.
Jun Kai Liao