The Model Structure
The basic idea of the model: Monitors the write operation of the operating system, encapsulates the filename, offset and data into a record, and sends it to the remote mirroring server. Then the same write operation can be performed on the remote mirroring server to make the file synchronized. Thus much computation is avoided on local and remote machines. Fig.1 illustrates the model structure.

This model includes the local part and the remote part. The local part includes file monitor module (FMON), local mirroring module (LMM), local differential backup module (LDBM), local communication module (LCM) and two data structures, list L and queue Q. the remote part includes remote communication module (RCM), remote mirroring module (RMM) and remote differential backup module (RDBM).
Before explaining the model in detail, we define some terms as follows:
List L registers files needed to be synchronized. Each element of list L includes two parts:
1) File name: char string, the name of file to be monitored.
2) File path: char string, the path of the file to be monitored.
Queue Q, a FIFO queue, records the write operations performed on files. Each element in queue Q includes four parts:
1) File name: char string, the name of the file write operation
performed on.
2) File path: the path of the file write operation performed on.
3) Offset: integer, position of write operation.
4) Data: binary string, the content need to be written into the file.
To improve the efficiency, queue Q is designed into two levels: level one (Q1) and level two (Q2). Because of the tremendous disparity of I/O speed between disk and RAM, Q1 resides in RAM and Q2 resides in disk. The tail of Q1 joins with the head of Q2. Q1 and Q2 are logically combined into a queue Q.
The size of queue Q is freely configurable. The larger the size of Q, the more the changes of files can be recorded. And the system may also tackle with more disadvantageous factors such as the limit bandwidth of Internet, the variant of speed and the breakdown of network. The size of Q can be determined practically by the size of RAM, the capacity of disk, files needed to be synchronized, speed of network, maximum time allowed being breakdown, and so on. It is generally recommended that the size of Q1 is 64M and the size of Q2 is 512M.
The model can run in synchronous and asynchronous modes both. And the latter can be divided into asynchronous operating mode and differential backup mode.
If the model runs in synchronous mode, when FMON captures the write operation of file F and F is in list L, it will encapsulate the filename, path, offset, data into a record, which is sent immediately to the remote mirroring server. The RMM on the remote mirroring server will make the file synchronized by performing the same write operation on file F according the record received from local mirroring server. Then the remote mirroring server sends local mirroring server a message as a response. The local mirroring server can take the next action only after a response message has been received (see fig 2).

Figure 2
If the model runs in asynchronous mode, when FMON captures write operation of file F which is in list L, and it will encapsulate the filename, path, offset and data into a record, which is inserted into queue Q. From queue Q, LMM continuously fetches record R and sends it to remote mirroring server. RMM on the remote mirroring server performs write operation according the record R received from local mirroring server to make the file synchronized (see fig2, fig3, fig4).

Figure 3

Figure 4
When the model runs in differential backup mode, the model makes the whole file synchronized one-off by traditional differential backup method based on Internet. This mode is quite efficient especially when there are a lot of records accumulated in queue Q.
The three modes of the model are suitable for different conditions. Tab.1 shows the requirement of network connection, real-time, write frequency of the model’s run mode.
Tab. 1 The requirement of network connection, real-time, write frequency of the model’s run mode
Network connection Real Time Write frequency
Synchronous mode Low jitter, interrupt not allowed High Limited in a certain range
Asynchronous operating mode Relatively low jitter, transient interrupt allowed Relatively high No limit in a certain range
Differential
back up mode Interrupt allowed Low Almost no limit
Model Functions
File monitor module (FMON)
FMON monitors the operation W on file F. If operation W is a write operation and file F is in list L, FMON performs corresponding action according to the mode of the model (fig.2 illustrates the flowchart).
Local mirroring module (LMM)
LMM gets record R from the top of queue Q and sends it to remote mirroring server to make the file synchronized (fig.3 illustrates the flowchart).
Local differential backup module (LDBM)
LDBM is controlled by FMON to make the whole file synchronized by differential backup method on the basis of Internet.
Local communication module (LCM)
LCM sends data produced by local modules to RCM and receives data from RCM, which is handed in local modules later.
Remote communication module (RCM)
RCM sends data produced by remote modules and receives data from LCM, which is handed in remote modules later.
Remote mirroring module (RMM)
RMM on the remote mirroring server receives instructions sent by the local mirroring server and performs write operation on corresponding file to make the file synchronized (fig.4 illustrates the flowchart).
Remote differential backup module (RDBM)
RDBM listens on the remote mirroring server. When an instruction comes, RDBM performs differential backup operation.
Conclusion
Based on cheap and convenient Internet, data backup technology has a very prospective future in practice. Experiments have proved that the model to update large files based on Internet presented in this paper can adapt to the various network conditions and has excellent, controllable, configurable fault tolerance. Further more, the model has wonderful real-time performance when a network is in good condition.
Kui Zhao, assistant researcher of computer department at SiChuan University (China), has a background in research, computer networks, information security, Internet, database, business continuity, and disaster-recovery product development. E-mail: zhkui2k@sohu.com.
Tao Li received B.S. and M.S. degrees in computer software from University of Elementary Science and Technology of China (UESTC) in 1986 and 1991 respectively, and a Ph.D degree in circuit and system from UESTC in 1995. From 1993 to 1994 he was a visiting scholar at University of California at Berkeley. He is currently a professor of computer department at SiChuan University. E-mail: litao@scu.edu.cn.
Ping Yang is associate professor at the School of Computer Science and Engineering of SiChuan University. E-mail: yangpin@cs.scu.edu.cn.
XiaoJie Liu received B.S. and M.S. degrees in computer software from UESTC in 1986 and 1991 respectively. She is now an assistant professor at SiChuan University.
LiHui Wang received a B.S. degree in computer science and technology from SiChuan University in 2003. E-mail: gailya@sohu.com




