The revolution began near the beginning of the “dot-com craze” just a few short years ago. Prior to this point the enterprise space was keeping its data either in paper form or some other physical media (punch-cards, hardcopy, etc). In this configuration, the business data could theoretically survive any digital disaster, as it was not kept in digital form. While still susceptible to physical disaster (earthquake, fire, etc.), the potential for serious loss by these means could be mitigated by storing physical copies off-site in repository or secure facility. This is an important concept – as this theory of off-site storage later crosses over into the digital world in a nearly identical form. Mainframe systems of the time were backed up to heavily protected magnetic tape, and the data kept on them was generally used in conjunction with physical hardcopy, so that a loss of the system for a day to restore data wouldn’t bring down the enterprise.
Suddenly, with the advent of widespread computer use at the desktop level, employees were not storing vital corporate data on the mainframe or in physical files. That meant that power fluctuations, physical anomalies, and a host of other disasters could literally wipe out valuable data without any potential of restoring it. Very quickly, backup systems and office-based servers sprang onto the scene to begin to address concerns that corporate data on PC’s should be handled in the same manner as the paper files and magnetic tapes that protected the mainframes.
As we progressed through technology, the mainframe systems began to dwindle and disappear, even in the enterprise space to a great extent. Smaller server systems were put into place as a more flexible and economical alternative to the older, slower mainframe systems. With this new computing power came a whole new host of potential problems, not the least of which was data loss. Once again the corporate IT staff was faced with the problem of not only having to worry about the desktops getting vital data to the server systems, but the fact that the server systems themselves were little more secure than the desktops in the first place. More complex backup systems were constructed to shift the data from the volatile servers to somewhat more stable backup media, usually some form of magnetic tape.
For several years this seemed to be an ideal situation. In the event that data loss occurred, the tapes could be used to restore that lost data or even entire data-systems in many cases. However, as we became more and more dependent on our data-systems, business began to realize that the long amounts of time spent waiting for the data-restoration process translated into large sums of lost revenues. A better system had to be found in order to minimize downtime, and disaster recovery services were born to fill the need.
Disaster recovery services (DRS) are systems put into place to restore data to a downed or corrupted server system or other data system as quickly as possible. The field of DRS is extraordinarily broad; ranging from re-configuring tape systems to make them faster and more reliable, all the way through keeping duplicate servers on standby to allow them to stand in at a moment’s notice. Once again, IT staff had thought they had solved the problems of data loss, but once again they were about to be proven wrong.
This brings us to today’s economy, which is data driven, IT dependent, and absolutely chained to 24-by-7, 365-day-a-year data access. Even a few minutes of downtime in – for example – an online stock trading software system can cause millions in lost revenue. Computers never take days off. Data systems never call in sick (one hopes) or demand coffee breaks. The loss of any time online translates directly into the loss of corporate revenue, especially in the enterprise space. Companies began the process of translating IT functionality into business reality, and the results shocked big business to the very core. Disaster recovery was no longer an option; disasters could not be allowed to impact the business case at all.
How could a business continue to operate in light of the myriad of potential hazards and disasters out there? With the constant threat of earthquakes, floods and other natural disasters; coupled with power grid outages, espionage and other man-made data loss issues, there were too many variables to anticipate every potential cause of data loss and system failure. The science of business continuity was born to find a way to keep the systems running, no matter what was going on in the physical world.
Business continuity planning (BCP) is really just disaster prevention in action. It’s the science of determining ways to allow data systems to continue working, even if an entire physical location is downed or destroyed. The baseline idea to remember here is that data-systems are portable objects. They are not dependent on the particular pieces of hardware you run them on, and can be moved to other, similar hardware at any time – provided the right expertise and software is available. This is a fundamental shift in thinking from the days of mainframe-based enterprise computing, where the system was the hardware for the most part. In today’s digital arena, hardware is often the least important part of a data-system, relying instead on the level of operating systems and software packages you are using to determine the power of the system itself. Once business IT staff made this leap of faith to believe that hardware was not the most crucial component, the doors of BCP were flung wide open to allow for the advent of the distributed data system.
IT development staff could now design systems that ran as clusters, multiple computers sharing a common data source that could stand in for each other in cases where one server failed. They formed load-balanced websites with groups of servers that could all share the load of a single or even multiple downed machines. They created e-mail server groups that spanned the country, each one able to hold messages for an offline counterpart. No longer was the corporate data system at the mercy of a single point of failure. The entire data-center could be grouped, clustered, and manipulated as a single entity to protect the data of the enterprise!
After the initial elation wore off, IT staff realized one major flaw in the plans. There was a single point of failure, the single data-center. For the most part, the business continuity failover systems where physically located about three feet from the primary systems. Meaning even the most redundant data-center can fall victim to a power grid failure, and when the diesel backup generators finally run out of power, the data-center and corporate data, go offline. Far from being back where we started, BCP still had a long way to go before we could reach the mythical “five nines.”
Mega-storage companies like EMC stepped up to the plate by producing storage systems that could replicate themselves to other data-centers, not located in the same physical vicinity. This meant that the entire body of corporate data could be kept up-to-date in some other location, thereby protecting against the possibility of failure due to the loss of a physical location. This is the same theory businesses’ used to rely on to protect physical data like punch cards and hardcopy years before.
On the surface this was an ideal solution, but it was only a reversion to disaster recovery, just on a much larger scale. The data was safe in a secondary location, but inaccessible until the primary location could be brought back online. The servers – the machines that end-users’ computers connect to in order to get information – were still located and attached to the primary storage device. Even clusters, where groups of servers can stand in for each other, needed to by physically connected to the primary storage device, meaning that if the entire data-center went offline, there was nothing for the end-user to connect to – even though the data itself was safely stored off-site. Technology needed to make another leap in order to fully address the situation.
Building on data replication began to develop BCP software that would allow the entire data-systems of an enterprise transcend physical boundaries, thereby allowing the systems themselves to survive physical site failure, not just the data. These products allowed the enterprise to eliminate the single point of failure of the single physical site without falling back to DR paradigms. Clusters no longer needed to be physically connected to a shared storage array, and stand-alone machines could stand in for each other no matter where they were physically located. By utilizing platform and storage independent data structures, these products allowed IT staff to create duplicate hardware and software configurations in multiple physical locations that could share data and keep each other up-to-date. They could also stand in for each other on a moment’s notice without end-users having to perform any tasks. Essentially, the end user continues to work, uninterrupted, while the data systems handle all the tasks of taking over the data-processing load for their downed counterparts in some other city.
Large-scale data systems can now seamlessly replicate, not only the data itself, but also the very data-systems that are vital to keeping the enterprise up and running. A failure of an exchange e-mail system in Boston can now seamlessly switch to a physical system in Detroit, without the CEO (or anyone else) missing a single message. The IT staff can then correct the issues in Boston and fail-back the physical systems to restore them to their original state when time permits; without the pressure and rushing that often causes even more damaging mistakes than the original outage.
It is this monumental paradigm shift – from keeping everything on physical media that could be duplicated off-site to a digital world of self-healing data-systems – that can create the truly digital, always on enterprise. With the innovative new generation of software products now available to IT staff, the goal of “five nines” can be met for the first time, and can be met reliably regardless of acts of man or nature.
Finally, enterprise-class, always available systems can be constructed that would not be taken out by physical disaster, espionage, end-user accidents or any other mishap. The corporate data has become truly safe and secure, and business can get on with what it does best - concentrating on business and letting the data-systems concentrate on the data.
Mike Talon has been working in the information technologies field for more than 10 years, specializing in data protection and disaster prevention. He currently works for NSI Software, a leading developer of data replication technologies and services, and lives in New York City, where he is constantly striving to find new ways to live well in interesting times. You can reach him at firstname.lastname@example.org.