Good business practices dictate that an organization should:
- Understand the impact on its operations resulting from a disruption of service;
- Understand the degree of dependence on its key service provider, including the effect of a disruption on the operations of a service provider;
- Identify recovery objectives and minimum requirements to be incorporated into its service level agreement.
This article addresses the importance of the service level agreement in the business continuity process. Reference examples will be made to suppliers of telecommunications services, due to their ubiquity and increasing relevance in an increasingly connected global economy.
Role of the Service Level Agreement
Most organizations do not operate in isolation. Decisions to outsource business processes to external vendors are based on criteria such as the economies of scale, functional specificity as well as the benefits of modern technological tools and techniques. Where business processes are outsourced, the terms of reference, roles and responsibilities are best defined by contract, along with supporting service level agreements.
A typical service level agreement defines the service being offered, roles and responsibilities for operations and provision of the delivery of the service, operational and quality goals, problem escalation procedures and service costs. Check points are built into service level agreements to determine the effectiveness and efficiency of vendor performance.
Assessing the Impacts of a Service Disruption
The key to success in the marriage of the business impact analysis to the service level agreement is to get the owners and developers of the documentation to recognize that one document feeds the other. The business impact analysis (BIA) is essential to determine the criticality of a business operation. According to the 1995 Central Computer and Telecommunication Agency (CCTA) book, “A Guide to Business Continuity Management,” the BIA identifies “the potential damage or loss that may be caused to the organization as a result of a disruption to critical business processes.”
The business impact analysis also gives information on disaster tolerance, the maximum acceptable time for outages and the varying degrees of tolerance for outages by different business operations. It follows that management must ensure that the service level agreement reflects the maximum acceptable outage for the prescribed operation. As a prerequisite, the BIA thresholds for a business process’ recovery time objective (RTO) and a recovery point objective (RPO) need to be quantified in relation to the external vendor’s role in the “supply chain” for the organization. Identification of these BIA thresholds provides a basis for vendor retention and vendor selection decisions.
Continuity Risks and Vendor Selection
Continuity risk considerations must be incorporated into the vendor selection decision. In selecting a vendor, organizations should conduct a business continuity risk assessment. The risk assessment procedure examines the potential for the service provider to fail to perform its contractual obligations as a result of a business interruption event.
Once the risk reduction operations have been identified, they should be incorporated into the service level agreement. According to CCTA, the following steps should be taken:
Ensure that highest risks or risks that are difficult to assess are specifically addressed in the service level agreement;
Stipulate in the evaluation criteria for service provider selection, the requirement to distribute key components of the business process or function across several sites to achieve resilience;
Split outsourced functions and processes across more than one provider;
Require service providers to develop and implement business continuity and security strategies;
Require service providers to demonstrate that the business continuity management plans are developed and are regularly and adequately tested;
Ensure suitable infrastructure is in place to facilitate all communications with the service provider;
Second customer staff to service provider to assist in management of business continuity;
Pre-define the level of authority to be given to the customer in the event of a disaster or other incident;
Use service providers that do not have other high-risk customers;
Outsource non-critical business functions and processes only.
Having selected the vendor, the organization should implement the following procedures:
Specify recovery objectives and minimum requirements derived from the business impact analysis into the service level agreement.
Other steps that the organization should be taking, according to CCTA, should be to:
Provide for in-house back-up facilities for outsourced services;
Arrange a back-up business continuity recovery contract with a second outsourcing organization, possibly a competitor from the original procurement competition;
Use the service provider to provide stand-by facilities for other business functions and processes that have not been outsourced.
Getting the telecommunications service providers to demonstrate business continuity capabilities is key to providing reliable and continuous service.
Evaluation of Business Continuity Options
In evaluating business continuity options for telecommunications services, according to CCTA guidelines, the customer must:
Assess the ability of provider to meet recovery requirements long-term/short-term;
Determine whether the implementation of recovery options will constrain future business and technical strategies;
Assess the level of commitment given to business continuity management by the service provider;
Determine whether options accommodate future growth in the customer’s business and associated business continuity requirements;
Define allocation of cost between the service provider and customer.
Command Control and Communications Structure
Roles and responsibilities of all parties concerned such as the service providers and customers must be pre-planned in order to avoid future confusion. CCTA guidelines recommend that the service level agreement:
Include representatives from the customer as part of the provider’s command and control structure and vice versa;
Establish special liaison teams and clearly define and agree on responsibilities and levels of authority.
An efficient command control and communications structure enables a smooth, orderly and speedy recovery in the event of a disaster.
Framework /Development of Business Continuity Plans
A standard framework for the business continuity plans should be provided to the telecommunications service provider and third party vendors to ensure completeness and consistency in planning for business continuity. The CCTA recommendation is to:
Include plans relating to individual service providers in the SLA.
In the development of business continuity plans for service providers and possibly third party vendors key items include, according to CCTA guidelines, include the ability to:
Allocate tasks between the service provider’s and customer’s staff.
Establish a change control process to cater for changes at both organizations.
Determine time required for travelling between the sites.
I mplement adequate audit procedures when recovery is taking place to allow additional expenditure to be audited.
Establish how salvaged material will be protected at the service provider’s premises where confidentiality is important.
Determine how crisis management and public relations will be coordinated.
Implementation of Stand-by and Business Continuity Risk Reduction Solutions
The process for implementation of standby and business continuity risk must be documented in the service level agreement. CCTA guidelines recommend that customers:
Ensure access by representatives of both customer and service provider to the emergency control centre.
Determine who owns risk reduction and standby equipment
Determine any changes required to existing procedures, e.g., for taking backups, protection of vital records.
Determine the need to install specialist telecommunications or other equipment.
Common Oversights in Establishing Service Level Agreements
The following issues have been identified as often being overlooked when establishing service level agreements (SLAs) with telecommunication service providers:
Uncertainty In Industry
The event of a failure by a major service provider in supporting an organization’s critical business operations (due to changes in the industry) is often overlooked when establishing service level agreements. Organizations may find themselves unprepared to handle such situations as they arise and could end up having to outlay huge sums of money to get services required. For example, in the case of Land Rover vehicle manufacturer UPF-Thompson, the manufacturer had to acquire the bankrupt vendor to get the supplies that it needed to stay in business.
Managers must be cautious about signing incomplete contracts since it can in the long run be costly. Contracts must be accompanied by service level agreements to form a complete agreement. A telecommunications service provider may promise to have details for a service level agreement completed within the first few months of the agreement but it may end up never being done. Managers can find themselves in ambiguous situations as it concerns the unsigned contract. The telecommunications service provider could argue that if the service in question is not already in contract then an additional fee can be charged.
The change control aspects for SLAs must also not be overlooked. Changes to a contract often may not be communicated to other departments holding copies. This could result in the payment of unnecessary fees to the service provider or acceptance of substandard service. Tracking of SLA expiry dates is always a problem. There may be a term in SLA that automatically extends the contract if customer fails to notify the service provider. A customer may end up having additional fees since the contract may be in favor of the vendor.
Failure to document problem identification processes, including who manages the services to allow access to the problem area by the vendor, often results in confusion at the time of an incident in terms of roles and responsibilities. Lost time in this area could mean financial loss for the organization.
Notification and Incident Management
Often the notification and incident management process is not outlined in detail in the service level agreement leaving room for misinterpretation. A telecommunications service provider would stop the clock once it has responded or notified the customer at the time of an incident. The lapse time prior to the customer or a third party in getting back to the service provider does not count as response time. The onus is on the customer to get back to the telecommunications company as soon as possible. The service level agreement most times does not specify how response time for notification and incident management is to be interpreted. How the telecommunications company interacts with a third-party service provider is also usually not defined in the service level agreement.
Help Desk Support
There is often a problem with the levels of help desk support from the vendor, and the ability to handle a variety of problems. There is also often no formal change management system in place to handle changes identified in the service level agreement. A lack of these capabilities could lead to longer recovery times and financial loss.
Unpredictable surges in level of usage with poor monitoring, which can result in slowing of response time, may have a significant impact on quality of service. This could mean a loss of customers for an organization and damage to its reputation.
Hardware and software configuration management is commonly not documented as a requirement in the SLA. An SLA is doomed to fail if the vendor fails to maintain an up-to-date configuration of its hardware and software.
There are cases where the service provider may have given the customer a false sense of security as it relates to having a redundant network. A customer may have a false impression that there is redundancy on two different networks. But what may really be provided is two different paths on the same network with the exact same physical equipment – but two different circuits. If the main network goes down the alternate paths will not be up and running either. The service provider should be asked to demonstrate the independence of the two networks as part of the service level requirement.
Definition of Metrics to Measure Performance
Telecommunications companies do not define the processes, and metrics that they are going to follow. There may be situations where SLAs fail to include benefits for exceeding service levels thresholds, and may lack penalty clauses for substandard performance.
SLA reporting features are most times not clearly defined in the service level agreement. Also, the service provider may not report downtime and penalties/free service availability in lieu.
It is important that an organization focus on availability measurements in business continuity plans. An end-to-end overview of how the service level agreement would support the business continuity process must always be addressed/documented in the actual service level agreement.
Maureen Dyer, CBCP, is the manager of business issue and continuity/disaster recovery for CIBC Mortgages in Toronto. Dyer holds a master’s degree in telecommunications and network management from Syracuse University and is a Certified Business Continuity Professional.