Single Points of Failure
- Published on July 8, 2010
- Written by DAVID BROWN
A Proactive, Organizational Approach
No doubt a recurring thought in your mind as a business leader is what to do when a critical system goes down. What happens to your client updates when you can’t access your regular lines of communication? How will you convey critical information to your teams when you’re challenged by less-than-stellar conditions? When any situation occurs that causes normal organizational operations to slow down or grind to a halt, a “single point of failure” (SPOF) is often to blame.
SPOFs are anything that can cause an organization’s operations to temporarily cease or decelerate – they can exist in a hardware system, within a specific channel of communication or be related to an individual person. Most importantly, they exist in every organization.
For initial simple examples, organizations with one IT person or no detailed documentation describing how the IT systems and environments are structured have an SPOF. It’s very difficult to react to and recover from a failure when the IT person is out of the office or if there’s no documentation to work from. Preventing an SPOF may even cause another SPOF – an organization thinks they’re being proactive by having multiple communication lines run from different vendors. Though they think they may have solved an SPOF by having redundant systems, the organization doesn’t realize those lines run through the same infrastructure. All of this said, though organizations appear to be aware of SPOFs, they often are not.
Why are organizations unaware of SPOFs if they pose such a threat? Perhaps it isn’t that organizations aren’t aware of SPOFs, but that they’re not aware of the consequences that accompany SPOFs. It’s very easy to tell someone how harmful a failure is after it happens, but it can be hard to persuade that person to identify those consequences prior to the event. Often, organizations take an “it will never happen to me” approach to failures, which is not only lackadaisical but potentially harmful or fatal to the firm’s bottom line. Just as some individuals will not purchase insurance because they don’t believe something harmful could happen to their house or car, some organizations continue to believe SPOFs will not affect them.
As you’ve probably figured , the easiest way to identify SPOFs is to wait until they manifest, but a proactive organizational approach is more ideal than a reactive one. Running a business impact analysis (BIA) is an effective proactive approach since it looks at both the financial and qualitative impact that an SPOF can create. Additionally, establishing monitoring systems helps make sure that resources are being managed correctly, such as keeping an eye on IT systems that manage critical parts of your organization. Finding certain trends or warning signs before they become full-blown points of failure is better than waiting for them to fail.
Everyone in your organization should be a part of this monitoring process, from lower-level employees to senior staffers. Employees should ask themselves, “If I can’t perform my duties, what does that mean for the rest of the organization?” and, “What is the back-up plan if I’m not able to perform my duties?” One way to ensure that all employees are aware of the answers to these questions is to have them describe their job duties in five bullet points – keeping their descriptions concise ensures that the most critical capabilities needed to perform their jobs are at the forefront of the employees’ minds. These bullet points also serve as a guide if someone else has to step in and perform that employee’s duties if they are unable to do so.
This monitoring tactic can further be applied to personnel within your organization by enacting a regular review process, providing daily feedback on employee performance, and identifying key performance indicators. Putting these monitoring systems in place and establishing a regular rhythm of meetings and reviews can help prevent breakdowns among teams, especially before an unknown SPOF pops up.
Another method for addressing potential SPOFs is redundancy: the use of duplicate systems. It’s important to have established, redundant systems that are fault tolerant and can function during points of failure in order for your organization to be up and running as soon as possible. Duplicate power supplies, routers, hard drives, and the data itself are necessities for any organization to ensure that when, not if, a failure occurs, normal business operations can continue to function or be brought back online quickly. The more critical the piece of technology, the more it should be duplicated. A copier going offline for an hour isn’t as critical to a data center as it is to a printing company, so the critical pieces of technology vary from organization to organization. Justifying duplication costs is as simple as considering the consequences of not having those systems available when the failure occurs – your organization’s bottom line is at risk if a system breaks down, as well as the trust you’ve secured with clients and other organizations.
In regard to how this applies to your personnel, ensure that key skills sets are duplicated among different members or can be quickly augmented by outside parties. Loss of unique institutional knowledge can lead to disaster. Similarly, if there’s only one person in your organization who knows how to enact a crisis plan, what happens if the crisis plan needs to be executed on a day when that person is out of the office?
As mentioned before, it’s not a question of “if” but of “when” failures will occur. Running a risk analysis for your organization can show you where the most potent potential failures reside and further highlights the fact that these failures will manifest. While this certainty of SPOFs happening doesn’t mean that organizations need to try to prevent all SPOFs, they need to be aware of where these potential SPOFs exist and be prepared to act when they strike. Investing time and resources into preventing every possible failure isn’t the wisest investment, as the more you try to reduce the probability of a particular event from occurring, the more that event will end up costing. Understanding your most critical potential failures and creating crisis plans for when the failures occur helps insure your organization against the surprise element of SPOFs.
How do you start identifying your organization’s potential SPOFs? Start by identifying your business process, or what you need to service your clients or the public each day. Figure out how you do business on a micro level – how essential is e-mail to your daily process? Can you access information on your phone if the connection in your office is lost? Are calls re-routed to another service center if your central phone lines are down? Along your business process path, are there points that you’re fairly confident in, or are there pieces you’re not as sure about, points that need to be “fattened up?” When you find those less-confident areas, assess whether or not this is a vital piece to your business process or if it’s something that isn’t as essential. Prioritizing your business aspects is critical to identifying potential SPOFs.
Next, go beyond simple awareness of your organization’s SPOFs and start building a proactive team. Create a strong squad for your organization so that when failures hit, it isn’t as tragic. The best way to start building this team is to define core competencies, or what makes individuals successful in your organization, then assign specific responsibilities that enable those skills.
Each organization should have a list of core competencies that each employee is assessed against during review processes, and each core competency list should include monitoring for SPOFs. Examples of core competencies include resourcefulness, integrity, having a proactive approach to business, leadership, and so on. Once these critical competencies are established, hire individuals that demonstrate those competencies. Also, make sure that each individual’s role is clearly defined and that these individuals can make their own decisions within those realms.
Beyond establishing core competencies, ensure that trust is built among your organization’s members. Trust can be built through “trust falls” and ropes courses, certainly, but how about building trust by improving communication within an organization and reinforcing organizational goals? Having each team member understand what needs to be accomplished to be successful is a process that isn’t accomplished in one day of trust discussions – it’s a practice that you need to commit to every day. Make sure that everyone is on the same page with how their role helps the organization succeed by communicating often and communicating well. If everyone is focused on the end goal, suspicions around colleagues’ motivations are drastically reduced. By helping your team understand each member’s different communication styles and personality types through various frameworks, trust can be built over time. Building trust among the team is essential, as this trust is what will ultimately make or break the organization during temporary failures.
Finally, don’t let an SPOF destroy your organization. The reaction your organization has when an SPOF strikes will define its success in the years to come. It’s time to start planning ahead for SPOFs, instead of letting them pop up without notice. A proactive approach to SPOFs is the best line of attack and will ensure that you’re ready to face the next SPOF when it makes its debut.
David Brown is president and founder of St. Louis-based Datotel, LLC, a provider of cloud-computing environments and colocation. For more information, call 314-241-9101, visit datotel.com, or follow the company on Twitter and Facebook.