The truth is, computer “disasters” of one sort or another happen to everyone. Yet the number of companies without a plan for recovering from a disaster is overwhelming. And 60% of those who do have a plan in place have never tested it, and don’t know if it’s adequate or if it will work when implemented.
The word “disaster” tends to bring to mind the Hollywood definition - hurtling asteroids, city-leveling earthquakes, spewing volcanoes, sinking luxury liners, and F5 tornadoes. But not all disasters make the headlines. A system-crippling disaster can be as quiet as a telecommunications failure, a blackout, or employee sabotage.
Consider this - according to statistics from the Association of Records Managers and Administrators, almost half of the companies who lose their records in a fire go out of business immediately. Of those that do reopen their doors, more than half fail within a few years.
Cyber terrorism has become rampant. No longer just hackers, but the newly-dubbed “crackers” are getting into corporate systems. Compared to crackers, hackers are just programming prodigies with a little too much curiosity. Crackers, on the other hand, are cyber terrorists who specialize in cyber theft, vandalism, and “denial-of-service” attacks. In a denial-of-service attack, the cracker intends to prevent legitimate users from accessing the service. They may attempt to impede network traffic by flooding the system, or prevent access by disrupting connections between machines, or machines and users.
For example, network connectivity is often interrupted using the SYN flood attack method where the attacker initiates connection with the victim machine but prevents completion of the connection. The victim machine then reserves one of its limited numbers of data structures for each “uncompleted” connection the cracker has initiated. In just a few minutes, a clever cracker can lock out even the system administrator.
Crackers also enjoy destroying or altering configuration information. By altering the routing information, it is possible to disable your network. Crackers are much more than mischievous - they are criminals who also engage in industrial espionage, software piracy, data manipulation, and theft - usually stealing customer account information or credit card numbers, employees’ personal information (for purposes of “identity” theft), passwords and login IDs. Using your password, they make sure all audit trails reflect you as the perpetrator of that unauthorized computer time.
Janet Reno, after recent denial-of-service attacks on eBay, Amazon.com, and Yahoo! sites, proclaimed that this was a “wake-up call” on what needs to be done to improve security and catch crackers.
Haven’t been hit by a cracker yet and feel fairly safe? The experts believe that 97% of all high-tech crimes go undetected.
Even if no falling rubble has been seen out the office window lately and the chances of a large-scale disaster seem remote, every business is subject to smaller problems. Most disasters won’t occur to your computer at all - they might, however, render your system (and your critical files) inaccessible, either electronically (satellite failure, prolonged power outage, virus, telecommunications failure), or physically (building evacuation, ice storm preventing travel, flooding, building closure). And no business is immune to the human element, whether accidental (unpredictable programming error, mistakenly erased files, even spilled coffee) or intentional (crackers, disgruntled employees). In certain circumstances, a key employee (MIS manager or programmer) leaving the company can be defined as a disaster, particularly if that person has no backup and no time to train a replacement. These people usually have complete access to the system and key data, and they have the know-how to manually alter either.
Any company, regardless of size, who has computer-held or -maintained data that would hinder business if it were to suddenly become inaccessible needs to safeguard against potential disasters. As society and business become more reliant on interconnected computer systems, we become more vulnerable to technical difficulties from outside, and the need to protect data becomes more critical.
“But I have insurance.” In the case of a large-scale disaster, the insurance company will be one of the first calls made, granted, but insurance policies don’t cover all that will be lost. Insurance may cover business interruption and pay for property damage and extra expenses. However, it won’t retain your vendor relationships, clients, or employees, nor will it return the business to normal. In a disaster, it is negative cash flow that ultimately causes companies to fold - if you wait for the final insurance settlement before resuming operation, it’s too late.
WHAT WOULD AN IT DISASTER STRATEGY PROVIDE?
The main goal of an IT disaster strategy plan is basic - save the business. The goal is met by reducing downtime, maintaining an acceptable cash flow, minimizing risk to the company, recovering critical operations, continuing the supply of services and products (both to and from your door), and protecting the business’ competitive position (customer confidence and goodwill, investor and creditor confidence, and your reputation).
Outside sources may encourage the developing of a disaster plan. Customers want to know that their primary suppliers have a plan for continuing service in the event of adverse conditions. Many organizations require a disaster strategy plan in order to be certified. A plan also enhances the value of the company, an advantage when financing the operation or extending public offerings.
There are legal reasons for setting up such a plan. Consider these statutes which may apply to your business: liability statutes that establish levels of liability for directors and officers under the “Prudent Man” laws, risk reduction statutes that deal directly with risk management requirements for disasters, and security statutes that address computer fraud and misappropriation of computerized assets. There are also vital records statutes that regulate retention and disposal of corporate records (including electronic data), and contingency planning statutes concerned with developing plans for the recovery of critical systems.
In a nutshell, a disaster plan will designate a disaster coordinator and team, identify potential disasters along with their probability and impact, outline the specific steps to take for the different disasters that your particular business could face, and make sure your people know the best procedure to keep the business on track and surviving. The plan will reduce the number of bad decisions made under the duress of an emergency, reduce panic, and may potentially save the business in the process.
Writing and testing your IT disaster recovery plan can pinpoint prevention methods that will help limit your business’ exposure. You may encounter such flawed procedures as backing up over the same tape night after night (you should rotate between at least seven backup tapes in case one goes bad), safekeeping the only key to the data center on the manager’s key ring (if he is not available or his keys are lost, are you prepared to break the door down?), or that your emergency phone numbers run on the same system as your regular phones (in an emergency you may find yourself without the system, and communication, altogether).
AN OUNCE OF PREVENTION
You may also want to avail your system of some prevention techniques that have been developed since it was installed. Particularly in the realm of backup, which is a main key to disaster protection strategies, there are newer failover and data protection technologies that have a lot of advantages over the traditional method of rebuilding corrupted or lost files with outside media.
In single-host, standalone environments the decision to go with backup equipment is a given. You still have choices, however. Disaster recovery companies offer off-site backup protection, you can stay with an in-house dedicated backup option, or install redundant hardware components to the working host and throughout the file delivery subsystem (access paths, controllers).
Network Hierarchical Storage Management systems (HSM) can provide reliable data backup protection. The automated HSM process is continual, perfect for critical transaction data backup. For example, by setting the migration parameters to a short frequency, the software will mirror RAID data onto a Magneto-Optical library within minutes of file creation or change. Another copy of the entire file can be saved to a tape library at designated intervals. Files will continue to migrate according to the parameters set by the user.
Failover is a technique that is being used to advantage for disaster prevention. High-availability server clusters can share critical files - in the event that one server fails, the functioning server takes over invisibly, with no interruption in operations. As in HSM, software is the key to failover protection. A perfect fusion of applications and software for functions such as high availability systems, network backup and archiving, and enterprise-wide storage management is required.
A third popular option is the new Storage Area Network (SAN) technology. SAN implementation ensures fault resilient communication between storage pools and multiple servers, and that single points of failure cannot block access pathways. SANs also provide the advantages of high levels of data availability (rivaling their mainframe counterpart), faster access to gigabytes of data, and ubiquitous data accessibility from more reliable server-to-storage device hardware paths.
Data protection is the backbone of disaster recovery and the available options should be thoroughly researched when considering the particulars of your recovery strategy. But what if you find that traditional backup methods are still best for your business? There are some ways to optimize that, too.
Keep your redundant file copies on removable mediums (tapes, disks) and store off-site whenever possible. Again, make sure you have at least seven backup tapes that are rotated - in this way, if one goes bad, you have the next previous tape that still contains fairly recent data (depending on your backup frequency, of course). Store off-site in a place that is readily accessible (this seems elementary, but tapes stored at a disaster recovery vendor or bank vault may not be accessible during holidays, weekends, nights). Put a plan in place for periodically checking the reliability of the backup tapes and the shelf life of the media itself. Visit your tapes. Has the environment changed since they were initially stored (heating/cooling, dust)? What is currently being stored around them? Ask about their disaster prevention plan.
Preparedness is the key: Knowing where your backups are, and what condition they are in…knowing that a written plan is in the hands of everyone who might be the first to notice the disaster (Does your janitorial staff know what to look for and who to call for a problem encountered at night?)…Knowing that there are backup people who can take over if you can’t be reached (heaven forbid you go camping).
These are the things that will help maintain your business even if a disaster does strike. Everyone wants to hope that the biggest disaster they will face involves a data file or two, or coffee in a keyboard, but the reality is that disasters strike all the time. Ask anyone who used to have a shop on 43rd Street between 6th and 7th Avenues.
MANAGED DATA PROTECTION & RECOVERY GUIDE
The initial step in formulating a successful disaster recovery plan is to look at your company’s reliance on critical data functions. Below, is a list of questions to ask when deciding if a disaster recovery plan is essential for your business.
- How far behind the competition would your company fall without access to its stored data for a day? A week? A month?
- What would be the financial impact of an interruption in computer-to-storage operations?
- Do the day-to-day business functions of the company rely on the computer’s stored database?
- Does the business have a substantial customer base relying on E-commerce?
- Is the managed data used to directly generate income for the company?
- Are there legal liabilities or penalties incurred if the company is unable to meet obligations? Even if your company is not a bank or doing business in another regulated industry, keep in mind that corporate officers can be held personally liable for many business losses, including failure to take adequate precautions to protect against business interruptions.
- What is the impact on customer service and goodwill if data cannot be accessed and the company cannot provide timely information?
- Is the computer controlling or storing quality assurance data? What quantity of the company’s products would be rendered useless without that data or the computer-controlled measurement systems?
- Is there critical research being done which would be lost if the storage subsystem failed?
Derek Gamradt is the Chief Technology Officer at StorNet, Inc., a leading independent provider of Storage Management Services with headquarters in Englewood, Co. Mr. Gamradt joined the Company in 1990 and has extensive knowledge of all aspects of storage management.