|
DISASTER
RECOVERY
JOURNAL
P. O. Box 510110
St. Louis, MO 63151
(314) 894-0276
Fax: (314) 894-7474
Internet
www.drj.com
E-mail drj@drj.com
PUBLISHER &
EDITOR-IN-CHIEF
Richard L. Arnold, CBCP
richard@drj.com
SENIOR EDITOR
Janette Ballman
janette@drj.com
MANAGING EDITOR
Jon Seals
jon@drj.com
COPY EDITORS
Richard Sandhofer
richards@drj.com Pamela
Clifton
pamelaclifton@hotmail.com
ADVERTISING
Robert Arnold
bob@drj.com
_____________
Corporate
President/CEO
Richard L. Arnold, CBCP
richard@drj.com
Vice
President
Robert Arnold
bob@drj.com
CONFERENCE COORDINATOR
Patti Fitzgerald, CBCP
patti@drj.com
CONFERENCE REGISTRAR
Merce Knese
mercedes@drj.com
CIRCULATION
Laura Baugh
laurab@drj.com
INTERNATIONAL
CONTACTS
England: Thom Hetherington
Business Continuity
Phone: 0161-237-1007
thomh@tempus.demon.co.uk
Australia: Anthony J. Harvey
Journal of Business Continuity
Phone: 0011-613-953-0055-8
fax: 0011-613-953-0528
sector@notability.com.au
Japan: Shinji Hosotsubo
Quake Japan Co., Ltd.
Phone: 03-3215-2880
fax: 03-3215-2881
Brazil:
Jose Carlos Ferreira
Disaster Recovery Mercosul
Phone: 55
11 3666-9506
conc2000@uol.com.br
www.drms.com.br
|
|
Click
Here for a Printable Version
INFRASTRUCTURE RECOVERY
The
Weakest Link In Disaster Recovery
By ALEX BAKMAN
Immediate access to current, detailed configuration settings contributes
to faster IT disaster recovery and the continuity of business. Neglecting
this part of the IT disaster recovery plan can add hours, or even days,
to the recovery process.
Yet this critical link in the disaster recovery chain is usually overlooked
or is poorly implemented. This article discusses the reasons that this
condition exists, the consequences arising from it, and will show where
having detailed configuration documentation fits in the disaster recovery
process and how it aids in the rapid restoration of an IT infrastructure.
I will also provide an overview of how this link can be strengthened
through the use of automated solutions to collect and document current
configuration settings. Finally, I will show how having such information
can solve some of the day-to-day challenges of managing an enterprise
IT infrastructure with regard to compliance with federal regulations
such as the Health Information Privacy and Accountability Act (HIPAA)
and the Gramm-Leach-Bliley Act (GLBA) for those companies who keep consumer
financial records.
Business continuity plans are designed to ensure organizational survival.
Among other things, they provide a roadmap to restore an IT infrastructure
the business backbone of todays corporation as quickly
as possible after a disaster.
Many IT disaster recovery plans include some level of configuration
information that is collected at a given snapshot point
in time. Typically, this is a hardware and software asset catalog: vendor
name, model number, serial number, location, etc. for hardware; and
vendor name, version number, service pack information, etc. for software.
Most enterprises feel they have all the bases covered with these products
and services. However, the speed of business restoration efforts is
impeded by inadequate or absent documentation of the IT infrastructure
that must be recreated. Even when available, access to a safe data center
and backup tapes does not help IT staff (assuming they, too, are available)
to quickly rebuild a network in an emergency. Detailed knowledge of
server, database, and router configurations is essential to re-establish
a working IT framework in which to restore corporate data.
For most organizations, information and the technology that supports
it is the organizations most valuable asset. More than 75 percent
of the Global 2000 Corporations have installed enterprise resource planning
systems (ERP); supply chain management systems; collaborative front
office applications; and a host of other Internet and Intranet applications.
These applications are deployed over multiple systems and databases
and across multiple locations. Many but not all
mission-critical applications have their data backed up to tape that
is usually stored in a safe site off the corporate premises.

Time Is Money
Restoring the IT infrastructure is the most crucial phase in keeping
the business running in the event of a disaster.
The high cost of downtime goes beyond lost sales. Failure to perform
can lead to contractual penalties. Customers who abandon you may never
come back and even if they do, the cost of sales increases due
to a new competitive mix. If records such as invoices are lost, you
may lose thousands or millions of dollars.
While you are waiting to restore your IT infrastructure, you still have
to pay salaries, or suffer a public relations disaster. In the case
of the Sept. 11 tragedy, your companys reputation may not suffer,
though your stock price, credit rating, and cash flow can be impacted.
Spurred on by the events of Sept. 11, enterprise IT departments will
be focusing more time and money in disaster recovery plans, equipment
and services.
Disaster Recovery
Plans Are Often Static
The IT disaster recovery plan has, until recently, been viewed as a
static document that sits in a three-ring binder on every IT mid-level
managers shelf that does little more than provide comfort that
the IT department is ready to do its part to ensure business continuity.
Creating and updating the plan is usually an annual exercise, an initiative
that pulls in resources from across the staff and disrupts the normal
IT workload. Collecting configuration data from diverse platforms and
massaging it into meaningful information takes a tremendous
number of hours, and most IT departments do not devote resources to
keep the information current.
Why dont they? There are three main reasons:
First, almost no company has enough IT staff. According to the Information
Technology Association of America (ITAA), of the current US IT workforce
requirement of 10 million, there are more than 800,000 vacancies that
cannot be filled due to the lack of trained talent. The workload increases
but hiring never keeps up.
Second, the technical competence of individual IT talent varies with
training and experience. Configuration documentation may seem an entry
level task that most professionals seek to quickly move beyond.
Disparate IT staff members often collect different types of information
and the quality of their reports varies greatly. The more senior IT
people are assigned to more critical tasks, deployed by management where
they can provide the most value for their salaries, which average $85,000
per year ($75 per hour). The hours needed to assemble, verify and report
configuration settings can amount to tens of thousands of dollars in
a larger IT shop.
Third, IT staff turnover ranges from 8 percent to 17 percent, depending
on industry and geographic marketplace. The costs of hiring and training
new staff to replace lost employees is nearly triple the IT overhead
cost (about $225 per hour). And when IT staff leaves or is lost, their
knowledge of the corporate IT infrastructure leaves with them.
Two negative consequences result: First, any configuration data collected
in these documents even assuming it is accurate and consistently
documented across critical application systems rapidly becomes
out of date due to the one constant in the IT world: change. Second
(and until recently this was unthinkable), most disaster recovery plans
assume the existing IT staff will be involved in the restoration.
Even if IT staff survives intact, and is available to assist recovery,
the multitude of IT platforms and the large number of changes that occur
on a daily basis limit their effectiveness to support a backup data
centers restoration efforts. Thus, the IT disaster recovery plan
needs to be continuously updated with the latest configuration settings
reported in a clear, consistent manner. All changes should be easily
identifiable to preserve IT decisions from which backup staff can derive
knowledge.
Are Backup Tapes
Enough?
One of the most common reasons detailed configuration information is
not recorded is the belief that backup tapes contain everything needed
to restore systems into production.
The effectiveness of backup tapes depends upon the nature of the disaster.
A system that experiences a simple power outage or hardware failure
can easily be restored with backup tapes. If you have a hot backup site,
you dont even need to use tapes.
But undocumented tapes, while preserving business data, contain no configuration
data, and cause delays in restoring critical applications.
Restoring such applications occurs during the functional restoration
phase of disaster recovery. This phase can only be done once the infrastructure
is properly reconfigured. A critical element is the most recent security
settings. You need to ensure that the restored applications do not have
any security holes when they are returned to production.
In general, an IT department that has the detailed configuration settings
and the original operating system and application CD-ROMs can reach
functional restoration up to 30 percent faster than by running backup
tapes.
Throughout the multi-phase recovery process, detailed configuration
documentation that contains change information allows the original IT
staff and the restoration team to easily see, discuss, and alter any
changes in configuration settings that occurred from the last safe settings.
It also enables other personnel unfamiliar with that infrastructure
to get the network and business applications running again.
|
TYPICALLY MISSING FROM
BACKUP TAPES
NT / 2000 Servers
Share permission configuration information
Services (e.g. startup information, accounts
) configuration
information
Application and system files in use generally do not make
onto the backup tape
UNIX Servers Such As SOLARIS
Host and network dependencies
EEPROM settings such as specific boot instructions, SCSI
ID manipulation, etc.
Other KEY settings: initial system installation cluster,
virtual memory swap space sizes, disk partition slices, space
allocation considerations, etc.
Kernel parameters and configuration settings that affect
storage devices
Databases Such As ORACLE
Storage parameters
Schema objects: such as table dependencies and indexes.
Security: what privileges are assigned to users and roles.
Routers And Switches Such
As CISCO
Everything: system backup tapes have no network device
configuration information. Cisco routers and switches store configuration
information in a file called runningconfig. This is
usually (but not always) backed up by the network administrator
on a TFTP Server. That is part of the internal IT infrastructure
it may be unavailable in a disaster situation.
|
Collecting And
Maintaining Enterprise Configuration Data
Normally collecting and maintaining detailed configuration documentation
is accomplished in one of three ways: manually checking all the settings
on each network device, using specially designed tools that provide
partial data on certain products, and automating the process with software
that discovers, collects, and documents all key settings.
Where the process is done manually or with tools that provide parts
of the required information, automated tools make this task easier by
eliminating the work involved and increasing the speed that the information
is collected. They also improve the accuracy of the information collected
by eliminating human error that is inevitable when sorting through hundreds
of thousands of key settings. The information is presented and preserved
in a consistent report format.
Where the process is not being done, automated tools make it possible
to accomplish the task for the first time. Some applications enable
configuration setting data to be updated on a regular basis through
automatic scheduling to provide the most current information available
for disaster recovery.
Disasters fall into two general types, those that do not physically
damage the IT Infrastructure and those that do. Automation delivers
value in either scenario.
The first (and more common) type, where the infrastructure is not physically
damaged, may result from a power failure or an act of cyber-terrorism.
Companies need to restart critical applications quickly. Every minute
of downtime on an ERP application (e.g. SAP R/3) can cost a corporation
upwards of $7,500. Automated products can provide the configuration
settings from the last report prior to the disaster. This enables system
and database administrators to restore the settings to the last safe
settings prior to importing the latest data backup.
In the latter (and less frequent) case, where the infrastructure is
physically damaged or destroyed, automation provides the configuration
settings from the last report generated prior to the disaster. This
information can be rapidly sent to third-party backup facility providers.
The information can also be sent regularly to these firms every time
a report is generated, enabling the sites system and database
administrators to restore the settings to the last safe
settings prior to importing the latest data backup information.
Unlike an insurance policy where you need to have a disaster
to realize the benefits, detailed configuration information and documentation
can be used on a daily basis to improve the operations of the IT infrastructure.
Compliance reporting is a subset of a larger IT management requirement
that is driven by individual industry requirements for security
both of the data being managed and of the IT Infrastructure itself.
A critical component for being in compliance with these industry-specific
mandates is possessing current and historical documentation that provides
detailed configuration settings of the IT Infrastructure.
For example, the healthcare industry is working toward compliance with
the Health Insurance Portability and Accountability Act of 1996
known as HIPAA. Within HIPAA is the requirement for the security and
confidentiality protection of electronic health information. Automated
products contribute toward compliance with the security requirements
of HIPAA by providing current (and historical) detailed configuration
reports to support auditing, security, and disaster recovery.
There are similar reporting requirements in the financial industry,
including Gramm-Leech-Bliley, mandated by the Federal Reserve System
that requires recording detailed configuration settings for security
and backup. Also, firms that are ISO 9000 compliant or working toward
that certification also require extensive documentation of IT processes
and policies.
In conclusion, managing configuration settings can reduce IT recovery
time by as much as 30 percent. Collecting such information is an arduous
task that few companies ever accomplish due to insufficient resources.
There are products that can automate this process to provide and store
constant updates. Collecting this information not only will eliminate
downtime following a disaster, but avail an IT staff of data that is
necessary for internal security, compliance, and optimum network efficiency.
Alex Bakman (abakman@ecora.com)
is founder and CEO of Ecora Software, maker of IT infrastructure management
tools for auditing, security, and disaster recovery.
To comment on this
article, go to 1502-03 at www.drj.com/feedback.
|