|
INFORMATION
TECHNOLOGY
The
Hidden Factor in IT Network Downtime
By JONATHAN
BUCKLEY
Ultimately, all companies
across all industries today must maintain a high level of availability
of their IT and network systems or face great peril. Certainly this
readership does not need a rehashing of the economics of downtime. Lets
just suffice it to say, its not good. A disaster to a company
need not necessarily be a world-effecting, broadcast event. It could
be those not-so-quiet yearly corporate outages requiring a visit to
the CEOs office after systems are restored.
Presumably of more interest to this audience is root cause understanding
of network and system downtime and techniques or tools to help avoid
such unfortunate occasions. As evidence of this interest is the boom
in network and systems sales, and more recently the specific segment
of root cause software is an indication of the worldwide appetite for
solutions to measure, assess, predict and hopefully, ultimately avoid
network and system outages. Sales of these software packages are in
the tens of billions of dollars annually by all survey accounts.
Despite our best efforts and the best IT software management packages,
failures occur. Why? This author believes we dont do enough to
manage our IT and network ecosystem as a supply chain. One might think
of IT systems as a supply chain linked together from its raw material
inputs (electrons, process cooling, operating environments) to its processing
and storage (systems), delivery (network) and the like.
Now, take this supply chain model and put it in a more familiar stack
as we are used to seeing in the 7 Layer OSI model (below), but rather
simplify the details of processing, storage and delivery, and expound
on the inputs such as power, environmental, fire safety, space, and
physical security assets.
What you might notice is that the inputs to the IT supply chain seem
more like a foundational layer upon which IT depends. What is interesting,
however that most corporations lack today is rapid, remote visibility
into these supply chain elements even with their billions of dollars
in network system management packages and root cause engines. At the
same time, study after study shows that somewhere between 30 percent
and 50 percent of the failure in the IT supply chain has a root cause
in this foundational layer inputs.
For example, Ontrack, one of the best-known professional data restoration
services, studied data loss in more than 50,000 hard drives and other
storage devices. They concluded that hardware and system malfunction
accounted for 44 percent of all data lost. The list of causes are all
related to failure in the IT supply chain input or foundation layer
level power failures, power surges, dust, moisture, heat and
physical shock.
Even the United States courts have weighed and indirectly lent credence
to this case that the IT supply chain links are inseparable despite
our management otherwise. On April 18, 2000, in United States District
Court, D. Arizona., AMERICAN GUARANTEE & LIABILITY INSURANCE COMPANY
vs. INGRAM MICRO, INC., in summary:
This case presented an insurance coverage dispute between Plaintiff/Counterdefendant
American Guarantee & Liability Insurance Company (American)
and Defendant/Counterclaimant Ingram Micro., Inc. (Ingram).
American issued Ingram a property damage policy which insured against
certain business interruption and service interruption losses. As a
result of a power outage, Ingrams computer systems were rendered
inoperable. Ingram made a claim under its policy to American and American
denied the claim. Thereafter, American filed a Complaint for declaratory
relief against Ingram and Ingram filed a Counterclaim for breach of
contract.
Pending before the Court
were cross-motions for partial summary judgment on the issue of whether
a 1998 power outage caused direct physical loss or damage from
any cause, howsoever or wheresoever occurring to Ingrams
computer system.
The court concluded:
At a time when computer technology dominates our professional
as well as personal lives, the Court must side with Ingrams broader
definition of physical damage. The Court finds that physical
damage is not restricted to the physical destruction or harm of
computer circuitry but includes loss of access, loss of use, and loss
of functionality.
The Court is not alone in
this interpretation. The federal computer fraud statute, which makes
it an offense to cause damage to a protected computer, defines damage
as any impairment to the integrity or availability of data, a
program, a system, or information.
In this case, the court judged
that the interconnectedness of power and the IT machinery it supported
were inseparable ... in otherwise an interconnected chain. Why, then,
do we not manage uptime equation as a chain, without a divide between
IT and facilities?
The reasons for the disconnectedness of the IT supply chain are understandable:
1. The intelligent equipment in the input of foundational layer of the
IT supply chain does not lend itself well to the monitoring via ITs
standard SNMP polling (this topic alone would require an article);
2. To date, the tools to remotely monitor this foundational layer have
been legacy building control technologies designed for local, proprietary
use, not enterprise-wide monitoring incorporated into the rest of the
supply chain;
3. Consequently, too few companies have merged the interests of IT and
facilities.
Thus the entire IT supply chain has not been effectively managed as
an enterprise and this disjointedness, due to lack of root cause understanding
or tools to managed and assess these causes at the IT supply chain input
level, have lead to famous disasters. Service providers, Internet companies,
banks and manufactures alike have spent time in the newspaper because
of outages due to failed generators during rolling blackouts, water
leaks or simply failed air cooling and unmanned sites.
Keep in mind that there is only so much software tools can help in disaster
avoidance within the IT supply chain, but there is a value in being
able to rapidly assess the viability of the different supply chain components,
post-disaster.
For example, how long might it now take for your company to access the
viability of its facilities systems after a major earthquake?
The answer to that question is quite certainly different than, how long
would it take to access the viability of your network connection after
the natural disaster?
If the IT supply chain were managed as such, the answers would match
because you would have remote, unified, global visibility to all areas
of the IT supply chain including power, fire, environmental and physical
security systems just as you do server health.
New tools are now coming to the market to begin to address this forgotten
piece of the IT supply chain. Built on new era architectures that directly
monitor and predict the health and well being of the foundational layer
and link them to the rest of the IT supply chain, these tools now hold
promise to provide the same visibility into the IT enterprise as the
CFO would expect of his/her financial system.
In conclusion, proactive managing of the IT supply chain as one complete
enterprise can help businesses avoid the costly outages and downtime
associated with unplanned failure in their facilities machinery.
Getting to this information is a challenging task that few companies
ever accomplish, especially given the external pressures and implementation
obstacles today. As companies seek solutions to automate this entire
process, they must look holistically at all the key requirements and
leverage the benefits of new technologies coming to the market over
the next few years.
Jonathan Buckley (jbuckley@netbrowser.com)
is the vice president of marketing and business development for NetBrowser
Communications. NetBrowser has pioneered and patented an enterprise
monitoring software suite, e-Guardian, for what it calls The Zero Layers,
or the facility foundations layer upon which critical IT systems depend.
NetBrowsers Fortune 1000 customer base has plenty of stories of
how they avoided disasters using this new technology.
To comment on this article, go
to 1503-10 at www.drj.com/feedback.
«BACK
to the Articles Index
|