Home

Topics

Service Level Agreement

What is an SLA (service level agreement)?
Explore IBM's SLA solution Sign up for AI updates
Illustration with collage of pictograms of gear, robotic arm, mobile phone

Published: 30 May 2024
Contributor: Michael Goodwin

What is an SLA?

A service level agreement (SLA) is a contract between a service provider and a customer that defines the service to be provided and the level of performance to be expected. An SLA also describes how performance will be measured and approved, and what happens if performance levels are not met.

SLAs are generally formed between a vendor and an external customer, but companies also use SLAs internally to formalize agreements between departments or teams.

SLAs are an important part of outsourcing and information technology (IT) vendor contracts, providing an end-to-end view of the working relationship. They help ensure that all stakeholders have an accurate understanding of the service agreement.

SLAs set customer expectations, hold providers accountable and ultimately help optimize the end-user experience. SLAs pave the way for a smoother working relationship, settling uncertainty and points of contention from the start, and help protect the interests of all parties involved.

The Enterprise Guide to AI and IT Automation

With gen AI in the spotlight, it’s time for IT—powered by AI—to take center stage. This guide is designed to help enterprises tailor, update or even rethink their approach to IT and AI strategy.

Types of SLAs

There are 3 primary types of service level agreements: customer-level (sometimes called customer-based SLAs), service-level and multilevel SLAs.

Customer-level SLA

A customer-based SLA is an agreement between a service provider and a customer, whether the customer is external or internal. This agreement describes the service or different services that will be provided to the customer. For example, this agreement might be between a third-party cloud services provider and a tech company outlining the performance expectations of applications hosted in the cloud.

An internal SLA is an agreement between two different departments, teams or sites within the same organization. This agreement might be between development and business teams outlining the deployment cadence and overall expectations for a certain application or product.

Service-level SLA

A service-level SLA is a contract that details a defined service that is provided to multiple customers. If a provider offers a product with the same level of service and support regardless of the customer, they might use a service-level SLA.

For example, IT service management (ITSM) teams might use a common SLA for all customers that outlines the level of service customers can expect from their service desk when they contact the company for service support or to report an incident.

Multilevel SLA

A multilevel SLA is an agreement split into different levels to incorporate more than two parties, or different levels of service, into the same agreement. A multilevel SLA might be used between an organization and multiple external providers, such as in a multicloud model with numerous public cloud providers. The agreement can also be set up between more than two internal teams or departments.

An organization that offers a product at different pricing plans or service levels, such as a SaaS product, for example, might also use a multilevel SLA that describes the service level and expectations for each product tier.

Related content Register for the guide on observability
Components of SLAs

SLAs vary by company, product and the specific business needs of each organization, but most SLAs contain similar features. Key components include:

Overview

An overview section introduces the agreement and its most basic features, such as the parties involved, a broad outline of the services to be provided, and the start date and duration of the agreement.

Description of services

This section delineates the specific services provided and all related details. It includes information on service delivery, turnaround times for deliverables, maintenance schedules, relevant dependencies and any other relevant information. This section should provide a thorough accounting of all factors and circumstances.

Stakeholder breakdown

A stakeholder section lists all parties involved in the agreement, what their roles and responsibilities are and how to contact them. A primary contact is often designated as the go-to contact for reporting end-user issues.

Performance tracking and reporting

A performance section details the agreed upon service availability and service performance standards, and what metrics will be used to measure performance. This is usually defined within a service level objective (SLO)—an agreement within an SLA that establishes an agreed-upon performance target for a particular service over a period of time.

It often includes a workflow outlining how information will be collected and shared with stakeholders. All parties should carefully consider both the performance levels and the metrics used to gauge performance, because they are central to the entire agreement.

Exclusions

This section lists services or aspects of service delivery that are exempted from the agreement. This section excludes downtime due to issues with the customer’s equipment or factors outside of reasonable control (force majeure). It might also include exceptions for scheduled maintenance, dictating that such windows do not count against guaranteed uptime agreements.

Security protocols

A security section describes the security protocols and standards that the provider maintains and provides information on how the provider protects customer data. It also lists nondisclosure agreements (NDAs) and any measures involved with protecting sensitive information or intellectual property.

Redressing

This section defines the penalties that either side will incur should they not fulfill the terms of the agreement. It details escalation procedures, time frames for resolutions and the compensation to be provided should the service provider not fulfill the terms of the SLA. The compensation might be financial, service credits or something else.

This section also lists redemption terms such as earn backs—a provision that enables providers to regain service credits by meeting or exceeding standard service levels for a defined period.

Indemnification clause

An indemnification clause is a component of SLA agreements that protects the customer by shifting risk from the customer to the service provider. An indemnification clause is a provision in which the service provider agrees to indemnify—compensate for harm—the customer for any third-party litigation costs, losses or damages that result from a breach of service warranties.

Such provisions are not always present in agreements, particularly standardized SLA templates, but customers can seek to add them with the help of legal counsel.

Review and adjustment process

Vendor capabilities, workloads and customer requirements evolve over time. Accordingly, there should be an established process and timetable for reviewing and revising the agreed-upon terms and the KPIs used to measure performance. This review allows the SLA to incorporate the most recent features of the provider’s product or service and address current customer needs.

Termination process and terms

The agreement should include a section that outlines the circumstances that allow for the cancellation of the service agreement before its expiration date, and the notice period required by each party if such action is pursued.

Signatures

The agreement is signed by authorized stakeholders on each side, binding all parties involved to the terms of the agreement while it is in effect.

KPIs and SLAs

SLAs are the agreements made between provider and customer that specify agreed-upon service standards. KPIs are the measures used by the provider to gauge performance against these targets and enable teams to make continuous improvements. KPIs are designed to simplify the evaluation process and give teams an accurate idea of how they are performing toward any stated objective.

For example, if an organization has made certain guarantees around the cybersecurity of their offering, they might track KPIs like number of security incidents over a given period of time, intrusion attempts and the success rates of intrusion detection or prevention systems, cost per incident or vendor security rating.

What SLA metrics should businesses consider?

Service level objectives (SLOs) are a part of SLAs that set performance baselines for a specific aspect of service, such as error rates, request latency or uptime. Performance metrics and KPIs are used to evaluate the quality of service provided and determine if the service provider is meeting the terms of the SLA. 

Monitoring the appropriate metrics is an important part of an SLA’s success. Without the right data, it is difficult to know how the arrangement is serving either party. And tracking too many metrics can create an indecipherable mess. Different services will require the tracking of different metrics, however common SLA metrics include:

Availability and uptime

Uptime is the amount of time that services are working properly and available for use. This metric is usually given as a percentage over a period of time, say, 99.5% per 30 days (3.6 hours of downtime). Uptime requirements will vary by business type, and the SLA will reflect that.

For instance, 3.6 hours of downtime per month may be way too much for an e-commerce platform doing business globally. Such a company might need to be guaranteed more availability and would seek an SLA to reflect that.

Error rates

Error rates is a measurement that tracks production or service failure and the percentage of time that an IT service provider's service level falls below expected performance targets. The agreement might include SLOs for missed deadlines, delays in feature or update releases, negative help desk interactions, coding error rates, defect rates and other measures of technical quality.

Response time

Response time establishes the acceptable amount of time for a provider to log and respond to a client issue or request.

Resolution time

The resolution time establishes the acceptable amount of time for an issue to be resolved once it has been logged by the provider.

Mean time to recovery

This metric is the average time it takes to recover a product, service or system after a failure or outage.

First call resolution rate

This metric is a measure of the percentage of customers who have their issue resolved by the provider during their first interaction with the service desk or chat bot.

Abandonment rate

This is a key metric for customer service providers or organizations that have a customer service component. This is the rate at which customers abandoned their customer support inquiry before they received an answer from the help desk.

Security

A variety of security measures might be measured, such as undisclosed vulnerabilities, antivirus updates or software patches, to evaluate a provider’s commitment to IT security.

Business results

By using the appropriate metrics and KPIs, organizations can determine how a provider’s services or products are contributing to broader business goals. For example, a company undergoing a digital transformation might ask: are the provider’s cloud resourcing tools helping us bring our cloud computing spend back under control? Tracking the right data will help answer that question.

Benefits of an SLA

SLAs yield benefits for both the service provider and the customer. SLAs help to:

Improve quality of service and customer experience

In creating SLAs, organizations have an opportunity to closely examine their products, services and processes—and associated customer experiences—to determine what’s working well and what can be improved upon. An SLA establishes clear performance goals that provide benchmarks for measuring performance and customer experience success.

Facilitate communication

SLAs clarify the roles and responsibilities of all stakeholders, as well as processes and channels for troubleshooting issues and handling disputes. This clarity helps eliminate confusion and promote clear communication both internally and with external clients.

Increase service continuity

SLAs define expectations around service availability, set policies for downtime and lay out procedures for failure and disaster recovery. These measures help to minimize disruptions and unexpected downtime, and quickly resolve technical issues and service outages. Once satisfactory processes are in place, organizations can leverage automation to enhance service consistency.  

Minimize risk

The SLA process offers an opportunity to be proactive with risk management. The process identifies potential risks and threats ahead of time, and it helps business stakeholders develop plans to avoid or mitigate such issues. Organizations can improve service delivery and response times, create stronger contingency plans and bolster their overall risk management strategy.

Related solutions
IBM Instana® Observability

Democratize observability with a solution that anyone and everyone can use to get the data they want with the context they need. Purpose-built for cloud-native yet technology-agnostic, the IBM Instana Observability platform automatically and continuously provides high fidelity data—1 second granularity and end-to-end traces—with the context of logical and physical dependencies across mobile, web, applications and infrastructure.

Explore IBM Instana Observability Request an Instana demo

IBM Instana® application incident remediation

In today's ever-changing digital landscape, IT operations face an unprecedented challenge: the sheer volume and complexity of application data. With Instana automated incident remediation, you can bring application downtime to near zero through quick incident management and efficient issue resolution.

Explore Instana application incident remediation Sign up for the tech preview

IBM Cloud Pak® for AIOps

Innovate faster, reduce operational cost and transform IT operations (ITOps) across a changing landscape with an AIOps platform that delivers visibility into performance data and dependencies across environments.

Explore Cloud Pak for AIOps
Resources What is IT service management (ITSM)?

Explore the practice of planning, implementing, managing and optimizing information technology services to meet the needs of end users and help organizations achieve their business goals.

What is IT infrastructure library (ITIL)?

Learn why an information technology infrastructure library (ITIL) is essential for your organization and how certification benefits you and your company.

Observability for developers

As software architecture paradigms evolve from monoliths to microservices, here’s how observability is helping developers take more responsibility for their programs, even after delivery.

What is site reliability engineering (SRE)?

Automate IT operations tasks, accelerate software delivery and minimize IT risk with site reliability engineering.

What is generative AI?

Generative AI, sometimes called gen AI, is artificial intelligence (AI) that can create original content—such as text, images, video, audio or software code—in response to a user’s prompt or request.

Optimizing uptime with Instana

Facilitating over 2 billion file transfers per year for customers, learn how Exavault was able to reduce mean time to resolution (MTTR) by 56% with Instana.

Take the next step

IBM Instana provides real-time observability that everyone and anyone can use. It delivers quick time-to-value while verifying that your observability strategy can keep up with the dynamic complexity of current and future environments. From mobile to mainframe, Instana supports over 250 technologies and growing. 

Explore IBM Instana Book a live demo