Prepared for a Crash
Michael Skaff can imagine this kind of nightmare-but fortunately, he didn’t have to go through it when his company recently experienced a server crash that could have spelled huge trouble under ordinary circumstances. Fortunately, he had processes and software in place that made this complex, daunting task-reconstructing server settings-hardly more than a walk in the park.
Skaff is the Director of IT for NativeMinds, a provider of automated natural-language sales and customer service solutions for the World Wide Web and other applications. The company offers software and services for building automated virtual representatives as an integral component of effective web-based customer relationship management (CRM).
Good Documentation Is Essential
When he joined NativeMinds at the beginning of 2000, Skaff wasn’t surprised to find that the company had no overall documentation strategy in place. “I started out as a sysad [systems administrator] a long time ago,” he says, “and soon found out that one of the most vital jobs of a sysad, or any network manager, is to make sure there’s good network documentation in place. It’s so important, but somehow it always seems to get relegated to the background because of time constraints”.
Traditionally, the documentation process has required that someone manually check all of the system configuration settings and write a document recording those settings. It’s a tedious, labor-intensive activity that’s heartily disliked by IT professionals, who have to deal with more-pressing priorities like security, adding new users, and updating applications and equipment. But documentation suddenly becomes the number-one priority when disaster strikes… when it’s too late.
Server Database Went Down
Early one evening, after the office had closed, NativeMinds’ entire Microsoft Exchange server database went down. Rebuilding the server’s 10,000-plus settings might have taken the IT department all night and most of the next day-while all productivity and potential profits from the next workday went down the drain. The server that houses the in-house sales demo would have been slowed to a crawl, cutting off access to the NativeMinds offsite sales force. What’s more, the IT team would be spending their time tweaking and fixing network settings for weeks to come.
It could have happened that way, but it didn’t. Soon after coming to NativeMinds, Skaff instituted a disaster recovery plan that included consistent, disciplined documentation. In the course of formulating this plan, Skaff happened to read about a company and its unique application for automatically documenting network server structures. “I had been waiting for years for something like that,” he says. He ordered the product and it took a system administrator 15 minutes to document NativeMinds’ entire network.
Automatic documentation of an IT infrastructure? A new breed of products is helping chief information officers, system managers, and IT personnel easily build detailed documentation that covers the state and configuration of servers on a system-all in plain English. With only a few clicks and a minimal investment of time and personnel, these Documentors automatically survey the systems and generate comprehensive, well-written, expert-level documentation for all configuration settings. It takes about five minutes to document an entire server.
Restored Configurations in Hours
Just a week before the crash, the IT department had run a report detailing the setup and configurations of various servers. Skaff had produced a survey of the NativeMinds system with expert-level documentation for all the configurations of the Exchange server. When that server went down, the IT team was able to rebuild the server-in hours instead of days-to the exact state it had been just a week earlier.
As Skaff puts it, his IT staff was almost able to “turn their brains off” and “just read the document, enter the setting, read the document, enter the setting” until the server was up and running again. Without up-to-date documentation, reconstruction would have taken at least an additional six to eight hours-an eternity when your business lives in Internet time.
Asked what he would consider the basics of a good disaster recovery plan for an organization’s network, Skaff enumerates:
÷ Thorough documentation with offsite backup
÷ A comprehensive data backup strategy that also include offsite backup
÷ Redundancy in infrastructure
÷ Use of application service providers (ASPs) and management service providers (MSPs)
÷ Staff training in disaster recovery techniques
NativeMinds has also taken some extra precautions in response to California’s power shortage.
The company has numerous uninterruptible power supplies (UPSs) in place, as well as a colocation site equipped with generators. But, above all, if NativeMinds’ server “heart” should ever fail again, Skaff knows that he and his staff can get it pumping again in a hurry-because they’re ready to accurately restore all those thousands of settings in what amounts to almost no time at all.
Alex Bakman is CEO of Portsmouth, N.H.-based Ecora Corp., a vendor of automated IT auditing and network documentation and reporting tools that are used for disaster recovery, meeting regulatory requirements and planning migrations.