|
Small
Corporation Stretches Resources to Make Disaster Recovery Strategy a
Reality
By DUANE ABBOTT &
ALAN CARLSON
CNA Surety Corporation provides commercial and contract
surety products to clients in every state. In fact, it is a giant in
its field. However, by corporate giant standards, Surety is a small
corporation. And until recently, there was no comprehensive, information
technology or disaster recovery strategy in place. It’s our guess
that many small companies are wrestling with the same sorts of decisions
that Surety has while formulating a disaster recovery strategy and stretching
its resources to make it a reality.
Our DR budget is limited, and no one is dedicated exclusively to disaster
recovery. In fact, even as we are preparing for the next set of tests,
the six staff members who support the Windows, UNIX, and ZOS operating
systems and the network are preparing to roll out XP to more than 700
desktops across a wide expanse of geography and implement AIX and Active
Directory 2003, along with many other projects. It’s a challenge
for these six people to create DR procedures and then validate them.
But the business areas, IT, and the management have decided not to allow
disaster recovery to continue to take a backseat. We owe it to our clients
and our stockholders to formulate a plausible DR strategy and to test
it continually – even if it all has to be done on a shoestring.
We’re making a DR strategy a reality with minimal resources.
Surety’s History
As an organization, CNA Surety has taken the idea of business continuity
seriously for a long time. A business continuity plan has been part
of the organization’s culture and procedures for many years. We
even staged a mock tornado to test our plan and our mettle. This test
involved the entire corporate headquarters staff and was conducted with
the utmost seriousness and care.
We have also conducted several successful mainframe tests at our hot
site vendor and have demonstrated that our legacy systems could be restored
comprehensively and quickly, but that was the limit of our disaster
recovery testing. These systems remain vital to the ongoing success
of our business, but they are no longer the core of our application
and database infrastructure. And it wasn’t until the advent of
our data warehouse and presence on the Web that the organization stopped
seeing its legacy systems as its computing heart.
Again, like many small- to medium-size organizations, it was mostly
a matter of finding the resources – the employee time and the
funds – to create an IT DR plan when so many other competing projects
clamored for their share of a limited pie.
Surety’s Computing Environment
The computing environment has burgeoned well beyond the bounds of the
mainframe system during the last few years. Surety now has employees
in 39 locations across the United States and Canada.
There is an IBM mainframe; an HP SuperDome; many Intel servers running
three versions of the Windows operating system; a storage area network
(or SAN), comprised of two different technologies, that serves all the
operating systems (including HP-UX, ZOS, and Windows); substantial Oracle
databases that support a data warehouse, some SQL databases that support
stand-alone applications and others that provide data feeds to Web-based
applications, an imaging system that is at the heart of the business,
and high-volume in-house printing that is integral to the business.
In addition, the AIX operating system will soon be a part of the production
environment, and there is an AS/400 that is slowly being phased out.
And like most other businesses’ computing configurations, many
of our systems depend on one another for data.
It’s a fairly complex enterprise to have to plan to recreate in
the event that all or part of it could be destroyed by an F5 tornado
or made unavailable for a measurable period of time by a chemical leak
or a 100-year blizzard.
A New Era
Even with the resource limitations we continue to face, we’ve
turned the page in our disaster recovery planning history. We’re
no longer allowing fiscal limitations to stifle our progress. With a
recent hot site test, Surety has begun a new era in disaster recovery.
The organization has engaged in a two-year effort that will culminate
in an enterprise-wide test being conducted at a hot site provider in
approximately two years.
While gearing up for this latest hot site test, we departed from our
DR planning history in a few fundamental ways:
1. We are using our hot site tests to drive the DR effort. During the
next two years, we are planning to conduct four hot site tests (one
every six months). The tests are scheduled to become increasingly more
complex and comprehensive over the next two years.
During the months between these tests, on-site testing is being conducted
to verify the procedural documentation and to ensure that we have the
most successful and useful hot site tests possible. In-house testing
is done as much as possible to avoid the cost of doing it at the hot
site vendor.
While testing at a hot site vendor can be expensive, it’s a significantly
small commitment when compared to replicating a hardware infrastructure
or making your applications and systems unavailable for extended periods,
especially the high-availability applications.
2. When we create DR strategies, we don’t focus only on technological
solutions that might costs hundreds of thousands of dollars. Instead,
we consider how we can use existing equipment and even outside resources
to accomplish our goals.
For example, CNA Surety’s corporate headquarters is located in
Chicago, but the majority of CNA Surety’s IT resources operate
out of a major operations location in Sioux Falls, S.D. In the event
of a disaster, the Sioux Falls staff would find it onerous to travel
to Chicago, so we have had to create a strategy for housing our operations
elsewhere in the Sioux Falls area, but at minimal cost. To this end,
we have begun talking with other corporations in the Sioux Falls area
and local universities about sharing facilities in the event of a disaster.
It means a substantial savings when compared to engaging a hot site
provider, for example, to provide an alternate work site hundreds of
miles away and relocating hundreds of people for several weeks or months.
3. We engaged a technical writer who has a measurable IT background.
While this was a financial commitment, it is a limited one. He has helped
us to establish a foundation of procedures and standards that will live
well beyond his finite tenure. While he is not dedicated exclusively
to DR, it has been his primary focus and he helps the disaster recovery
coordinator (a systems programmer who has this title along with many
others) to focus the IT staff on DR tasks.
4. We’ve set reasonable expectations. We’ve given ourselves
approximately two years to grow our strategy. It’s a sufficient
amount of time to develop the procedures and documentation and maintain
a real sense of urgency.
Being Inventive
The old saying is “necessity is the mother of invention.”
This has been our mantra for the last year.
Servers: The backups have been aligned to perform restores according
to our business’ recovery time objective. While many servers are
aging rapidly, we have created procedures for restoring the operating
systems and services using model images created first and then distributed
to the remaining servers. We are also planning to restore the servers
at an alternate work site and not the hot site vendor or parent company’s
headquarters. This will mean significant travel and vendor savings.
Telephone and Printing Equipment: We are counting on a crate and ship
strategy to replace our telephone and printing equipment. Using this
strategy means we spare the cost of purchasing redundant hardware and
engaging a vendor to perform the printing.
Workstations: We are going through the process of replacing more than
700 workstations, which means restoring these workstations from a few
images and locating like hardware will be much easier. This just happens
to be occurring. It’s not being driven by DR, but it is a wonderful
confluence of events. However, we had prepared a plan to restore workstations
from many images in order to avoid the cost of aligning all our workstation
hardware and operating systems.
Alternate Worksite: To reiterate, we are working with other corporations
and universities in the Sioux Falls area to establish a sort of consortium.
We will all agree to house equipment and personnel in our facilities
for a limited amount of time in the event of a disaster. Most corporations
and universities have large meeting rooms and other facilities that
are not used 100 percent of the time and could be reallocated for a
limited amount of time.
The Future
There’s still much work left to be done. Again, we have a two-year
plan in place that will pinnacle with an enterprise-wide test at our
hot site provider. All application, data, and network components will
be replicated, and we hope to conduct substantial user acceptance testing
as well.
It’s going to continue to be a big challenge. During the next
two years, we will have had to fully integrate our IT disaster recovery
plan with our business recovery plan. Currently, neither plan references
the other. Also, there are many more procedures to create, revise, and
validate – with or without a technical writer, and we need to
solidify our alternate worksite plan.
We must also ensure that our change management process accounts for
disaster recovery. If a new application or server is implemented, it
must be accounted for in the disaster recovery procedures. Depending
on the impact to the existing computing infrastructure, the DR strategy
may have to be amended.
There’s also the challenge of keeping the documentation current
as vendors change and support personnel move in and out of roles. We’re
implementing a quarterly review process we hope will help to keep the
documentation alive. We’re always mindful of how much detail we
should include, the user level we should be targeting. After all, as
people leave and join the company, we cannot assume an intimate knowledge
of the computing infrastructure. The procedures can only assume expertise
in a field (like networking or operating systems) but cannot assume
knowledge of every router, server, and connection between applications
and databases.
We’ve created a set of standards for ensuring that operational
procedures and tasks do not violate or circumvent the disaster recovery
procedures. For example, backups cannot be modified to ensure that the
least amount of media is used. Rather, backups must be configured to
ensure that servers can be restored according to the priority established
by the business.
We’re up to all these challenges though. We have an IT team, including
management, that is invested in the process. We’re going to continue
to use the hot site tests as mile markers to measure and prompt our
progress, and we’re going to continue to be as inventive as necessary.
Duane Abbott is an IT consultant and a technical writer with
Aquent, LLC. He has been in IT for more than 20 years.
Alan Carlson is a systems programmer and a disaster recovery coordinator
working for CNA Surety Corporation. He has more than 30 years of IT
experience.
©Copyright
2004 Systems Support Inc. All rights reserved. Reproduction in whole
or in part in any form or medium without the express written permission
of System Support Inc. is prohibited.
«BACK
to the Articles Index
|