| DISASTER
RECOVERY
JOURNAL
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
MANAGING EDITOR
Jon Seals
jon@drj.com
ASSOCIATE
EDITOR
Ed Pearce, CBCP
ed@drj.com
COPY EDITORS
Richard Sandhofer
richards@drj.com
Pamela Clifton
pamelaclifton@hotmail.com
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
EXECUTIVE
COUNCIL
Jeff Dato, MBCP, KPMG
John Jackson, IBM
Edward S. Devlin, E.S. Devlin & Associates
James Hammill, CBCP, JMH Consulting Inc.
Pat McAnally, SunGard Availability Services
Brian Turley, Strohl Systems
Belinda Wilson, Hewlett-Packard
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
www.drms.com.br
|
|
Click
Here for a Printable Version
DATA RECOVERY
A
Model to Mirror Large Files on Internet
By KUI ZHAO, TAO LI, PING YANG, XIAO-JIE LIU & LIHUI
WANG
Data is very important to enterprises with the development of information
technology, however, without a special disaster recovery plan, the data
is very likely to be lost or damaged. A disaster recovery plan sets
up a system, which is completely or almost completely the same as the
original system and survives the data by geographical distribution and
redundancy of data system. Data backup technology is the basis of the
whole disaster recovery system, and the key to success.
The geographical reach of common backup technologies such as tapes,
RAID, SAN and NAS are strictly limited. SAN covers the maximum distance,
ranging from 10km to 100km, but it’s too expensive for average
users. The others mentioned above are only suitable for local storage.
The bandwidth and the speed of Internet have been sharply improved with
the development of Internet. Back up technology based on the cheap and
convenient Internet resources have come out and broken the limit of
distance. The traditional backup technologies based on Internet usually
work in differential backup modes and only the different parts of data
are transmitted through the network due to the bandwidth restriction.
In order to get the differences between local and remote files, much
computation is needed, which brings heavy resource burden on the local
and remote machines. Furthermore, it will take up much time as well
as bandwidth to find out the differences between local and remote files.
Suppose we want to synchronize a large file, for example a few hundred
MB. If there are only a few changes between local and remote files,
and we still compute whole file for the few differences, it is definitely
inefficient. This article presents a large file-mirroring model based
on using the Internet. Like traditional ways, the model transfers the
differences of the files to save bandwidth, but it doesn’t need
to compute the differences between the local and remote files, thus
improves the response speed and efficiency.
The Model Structure
The basic idea of the model: Monitors the write operation of the operating
system, encapsulates the filename, offset and data into a record, and
sends it to the remote mirroring server. Then the same write operation
can be performed on the remote mirroring server to make the file synchronized.
Thus much computation is avoided on local and remote machines. Fig.1
illustrates the model structure.

Figure 1
This model includes the local part and the remote part. The local part
includes file monitor module (FMON), local mirroring module (LMM), local
differential backup module (LDBM), local communication module (LCM)
and two data structures, list L and queue Q. the remote part includes
remote communication module (RCM), remote mirroring module (RMM) and
remote differential backup module (RDBM).
Before explaining the model in detail, we define some terms as follows:
List L registers files needed to be synchronized. Each element of list
L includes two parts:
1) File name: char string, the name of file to be monitored.
2) File path: char string, the path of the file to be monitored.
Queue Q, a FIFO queue, records the write operations performed on files.
Each element in queue Q includes four parts:
1) File name: char string, the name of the file write operation
performed on.
2) File path: the path of the file write operation performed on.
3) Offset: integer, position of write operation.
4) Data: binary string, the content need to be written into the file.
To improve the efficiency, queue Q is designed into two levels: level
one (Q1) and level two (Q2). Because of the tremendous disparity of
I/O speed between disk and RAM, Q1 resides in RAM and Q2 resides in
disk. The tail of Q1 joins with the head of Q2. Q1 and Q2 are logically
combined into a queue Q.
The size of queue Q is freely configurable. The larger the size of Q,
the more the changes of files can be recorded. And the system may also
tackle with more disadvantageous factors such as the limit bandwidth
of Internet, the variant of speed and the breakdown of network. The
size of Q can be determined practically by the size of RAM, the capacity
of disk, files needed to be synchronized, speed of network, maximum
time allowed being breakdown, and so on. It is generally recommended
that the size of Q1 is 64M and the size of Q2 is 512M.
The model can run in synchronous and asynchronous modes both. And the
latter can be divided into asynchronous operating mode and differential
backup mode.
If the model runs in synchronous mode, when FMON captures the write
operation of file F and F is in list L, it will encapsulate the filename,
path, offset, data into a record, which is sent immediately to the remote
mirroring server. The RMM on the remote mirroring server will make the
file synchronized by performing the same write operation on file F according
the record received from local mirroring server. Then the remote mirroring
server sends local mirroring server a message as a response. The local
mirroring server can take the next action only after a response message
has been received (see fig 2).

Figure 2
If the model runs in asynchronous mode, when FMON captures write operation
of file F which is in list L, and it will encapsulate the filename,
path, offset and data into a record, which is inserted into queue Q.
From queue Q, LMM continuously fetches record R and sends it to remote
mirroring server. RMM on the remote mirroring server performs write
operation according the record R received from local mirroring server
to make the file synchronized (see fig2, fig3, fig4).

Figure 3

Figure 4
When the model runs in differential backup mode, the model makes the
whole file synchronized one-off by traditional differential backup method
based on Internet. This mode is quite efficient especially when there
are a lot of records accumulated in queue Q.
The three modes of the model are suitable for different conditions.
Tab.1 shows the requirement of network connection, real-time, write
frequency of the model’s run mode.
Tab. 1 The requirement of network connection, real-time, write frequency
of the model’s run mode
Network connection Real Time Write frequency
Synchronous mode Low jitter, interrupt not allowed High Limited in a
certain range
Asynchronous operating mode Relatively low jitter, transient interrupt
allowed Relatively high No limit in a certain range
Differential
back up mode Interrupt allowed Low Almost no limit
Model Functions
File monitor module (FMON)
FMON monitors the operation W on file F. If operation W is a write operation
and file F is in list L, FMON performs corresponding action according
to the mode of the model (fig.2 illustrates the flowchart).
Local mirroring module (LMM)
LMM gets record R from the top of queue Q and sends it to remote mirroring
server to make the file synchronized (fig.3 illustrates the flowchart).
Local differential backup module (LDBM)
LDBM is controlled by FMON to make the whole file synchronized by differential
backup method on the basis of Internet.
Local communication module (LCM)
LCM sends data produced by local modules to RCM and receives data from
RCM, which is handed in local modules later.
Remote communication module (RCM)
RCM sends data produced by remote modules and receives data from LCM,
which is handed in remote modules later.
Remote mirroring module (RMM)
RMM on the remote mirroring server receives instructions sent by the
local mirroring server and performs write operation on corresponding
file to make the file synchronized (fig.4 illustrates the flowchart).
Remote differential backup module (RDBM)
RDBM listens on the remote mirroring server. When an instruction comes,
RDBM performs differential backup operation.
Conclusion
Based on cheap and convenient Internet, data backup technology has a
very prospective future in practice. Experiments have proved that the
model to update large files based on Internet presented in this paper
can adapt to the various network conditions and has excellent, controllable,
configurable fault tolerance. Further more, the model has wonderful
real-time performance when a network is in good condition.
Kui Zhao, assistant researcher of computer department at SiChuan University
(China), has a background in research, computer networks, information
security, Internet, database, business continuity, and disaster-recovery
product development. E-mail: zhkui2k@sohu.com.
Tao Li received B.S. and M.S. degrees in computer software from University
of Elementary Science and Technology of China (UESTC) in 1986 and 1991
respectively, and a Ph.D degree in circuit and system from UESTC in
1995. From 1993 to 1994 he was a visiting scholar at University of California
at Berkeley. He is currently a professor of computer department at SiChuan
University. E-mail: litao@scu.edu.cn.
Ping Yang is associate professor at the School of Computer Science
and Engineering of SiChuan University. E-mail: yangpin@cs.scu.edu.cn.
XiaoJie Liu received B.S. and M.S. degrees in computer software from
UESTC in 1986 and 1991 respectively. She is now an assistant professor
at SiChuan University.
LiHui Wang received a B.S. degree in computer science and technology
from SiChuan University in 2003. E-mail: gailya@sohu.com
©Copyright
2004 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
|