Fall World 2013

Conference & Exhibit

Attend The #1 BC/DR Event!

Spring Journal

Volume 26, Issue 2

Full Contents Now Available!

A Model to Mirror Large Files on Internet

Written by  Kui Zhao, Tao Li, Ping Yang, Xiao-Jie Liu & Lihui Wang Thursday, 22 November 2007 02:52

Data is very important to enterprises with the development of information technology, however, without a special disaster recovery plan, the data is very likely to be lost or damaged. A disaster recovery plan sets up a system, which is completely or almost completely the same as the original system and survives the data by geographical distribution and redundancy of data system. Data backup technology is the basis of the whole disaster recovery system, and the key to success.

The geographical reach of common backup technologies such as tapes, RAID, SAN and NAS are strictly limited. SAN covers the maximum distance, ranging from 10km to 100km, but it’s too expensive for average users. The others mentioned above are only suitable for local storage. The bandwidth and the speed of Internet have been sharply improved with the development of Internet. Back up technology based on the cheap and convenient Internet resources have come out and broken the limit of distance. The traditional backup technologies based on Internet usually work in differential backup modes and only the different parts of data are transmitted through the network due to the bandwidth restriction.

In order to get the differences between local and remote files, much computation is needed, which brings heavy resource burden on the local and remote machines. Furthermore, it will take up much time as well as bandwidth to find out the differences between local and remote files.

Suppose we want to synchronize a large file, for example a few hundred MB. If there are only a few changes between local and remote files, and we still compute whole file for the few differences, it is definitely inefficient. This article presents a large file-mirroring model based on using the Internet. Like traditional ways, the model transfers the differences of the files to save bandwidth, but it doesn’t need to compute the differences between the local and remote files, thus improves the response speed and efficiency.

The Model Structure

The basic idea of the model: Monitors the write operation of the operating system, encapsulates the filename, offset and data into a record, and sends it to the remote mirroring server. Then the same write operation can be performed on the remote mirroring server to make the file synchronized. Thus much computation is avoided on local and remote machines. Fig.1 illustrates the model structure.

 

Figure 1


This model includes the local part and the remote part. The local part includes file monitor module (FMON), local mirroring module (LMM), local differential backup module (LDBM), local communication module (LCM) and two data structures, list L and queue Q. the remote part includes remote communication module (RCM), remote mirroring module (RMM) and remote differential backup module (RDBM).

Before explaining the model in detail, we define some terms as follows:

List L registers files needed to be synchronized. Each element of list L includes two parts:

1) File name: char string, the name of file to be monitored.
2) File path: char string, the path of the file to be monitored.

Queue Q, a FIFO queue, records the write operations performed on files. Each element in queue Q includes four parts:

1) File name: char string, the name of the file write operation
performed on.
2) File path: the path of the file write operation performed on.
3) Offset: integer, position of write operation.
4) Data: binary string, the content need to be written into the file.

To improve the efficiency, queue Q is designed into two levels: level one (Q1) and level two (Q2). Because of the tremendous disparity of I/O speed between disk and RAM, Q1 resides in RAM and Q2 resides in disk. The tail of Q1 joins with the head of Q2. Q1 and Q2 are logically combined into a queue Q.

The size of queue Q is freely configurable. The larger the size of Q, the more the changes of files can be recorded. And the system may also tackle with more disadvantageous factors such as the limit bandwidth of Internet, the variant of speed and the breakdown of network. The size of Q can be determined practically by the size of RAM, the capacity of disk, files needed to be synchronized, speed of network, maximum time allowed being breakdown, and so on. It is generally recommended that the size of Q1 is 64M and the size of Q2 is 512M.
The model can run in synchronous and asynchronous modes both. And the latter can be divided into asynchronous operating mode and differential backup mode.

If the model runs in synchronous mode, when FMON captures the write operation of file F and F is in list L, it will encapsulate the filename, path, offset, data into a record, which is sent immediately to the remote mirroring server. The RMM on the remote mirroring server will make the file synchronized by performing the same write operation on file F according the record received from local mirroring server. Then the remote mirroring server sends local mirroring server a message as a response. The local mirroring server can take the next action only after a response message has been received (see fig 2).


Figure 2


If the model runs in asynchronous mode, when FMON captures write operation of file F which is in list L, and it will encapsulate the filename, path, offset and data into a record, which is inserted into queue Q. From queue Q, LMM continuously fetches record R and sends it to remote mirroring server. RMM on the remote mirroring server performs write operation according the record R received from local mirroring server to make the file synchronized (see fig2, fig3, fig4).


Figure 3


Figure 4


When the model runs in differential backup mode, the model makes the whole file synchronized one-off by traditional differential backup method based on Internet. This mode is quite efficient especially when there are a lot of records accumulated in queue Q.
The three modes of the model are suitable for different conditions. Tab.1 shows the requirement of network connection, real-time, write frequency of the model’s run mode.

Tab. 1 The requirement of network connection, real-time, write frequency of the model’s run mode
Network connection Real Time Write frequency
Synchronous mode Low jitter, interrupt not allowed High Limited in a certain range
Asynchronous operating mode Relatively low jitter, transient interrupt allowed Relatively high No limit in a certain range
Differential
back up mode Interrupt allowed Low Almost no limit


Model Functions
File monitor module (FMON)
FMON monitors the operation W on file F. If operation W is a write operation and file F is in list L, FMON performs corresponding action according to the mode of the model (fig.2 illustrates the flowchart).

Local mirroring module (LMM)
LMM gets record R from the top of queue Q and sends it to remote mirroring server to make the file synchronized (fig.3 illustrates the flowchart).
Local differential backup module (LDBM)
LDBM is controlled by FMON to make the whole file synchronized by differential backup method on the basis of Internet.
Local communication module (LCM)
LCM sends data produced by local modules to RCM and receives data from RCM, which is handed in local modules later.
Remote communication module (RCM)
RCM sends data produced by remote modules and receives data from LCM, which is handed in remote modules later.
Remote mirroring module (RMM)
RMM on the remote mirroring server receives instructions sent by the local mirroring server and performs write operation on corresponding file to make the file synchronized (fig.4 illustrates the flowchart).
Remote differential backup module (RDBM)
RDBM listens on the remote mirroring server. When an instruction comes, RDBM performs differential backup operation.

Conclusion
Based on cheap and convenient Internet, data backup technology has a very prospective future in practice. Experiments have proved that the model to update large files based on Internet presented in this paper can adapt to the various network conditions and has excellent, controllable, configurable fault tolerance. Further more, the model has wonderful real-time performance when a network is in good condition.


Kui Zhao, assistant researcher of computer department at SiChuan University (China), has a background in research, computer networks, information security, Internet, database, business continuity, and disaster-recovery product development. E-mail: zhkui2k@sohu.com.

Tao Li received B.S. and M.S. degrees in computer software from University of Elementary Science and Technology of China (UESTC) in 1986 and 1991 respectively, and a Ph.D degree in circuit and system from UESTC in 1995. From 1993 to 1994 he was a visiting scholar at University of California at Berkeley. He is currently a professor of computer department at SiChuan University. E-mail: litao@scu.edu.cn.

Ping Yang is associate professor at the School of Computer Science and Engineering of SiChuan University. E-mail: yangpin@cs.scu.edu.cn.

XiaoJie Liu received B.S. and M.S. degrees in computer software from UESTC in 1986 and 1991 respectively. She is now an assistant professor at SiChuan University.

LiHui Wang received a B.S. degree in computer science and technology from SiChuan University in 2003. E-mail: gailya@sohu.com

Login to post comments