SLI, SLO, and SLA Definitions

Explain the difference between SLI, SLO, and SLA with examples.

mid

intermediate

SRE

Question

Explain the difference between SLI, SLO, and SLA with examples.

Answer

SLI (Service Level Indicator) is a metric measuring service behavior (e.g., latency, error rate). SLO (Service Level Objective) is an internal target for that metric (e.g., 99.9% availability). SLA (Service Level Agreement) is an external contract with customers specifying consequences for missing targets. Example: SLI = request latency, SLO = 95% of requests < 200ms, SLA = credits if monthly availability drops below 99.5%.

Why This Matters

These concepts form the foundation of reliability engineering. SLIs help measure user experience, SLOs balance reliability with feature velocity, and SLAs formalize business commitments. Without clear SLOs, teams either over-engineer for unnecessary reliability or ship too fast and burn out on incidents.

Code Examples

SLO definition example

yaml

SLI measurement in Prometheus

promql

Common Mistakes

Setting SLOs at 100% (impossible and prevents any changes)
Confusing SLOs (internal targets) with SLAs (customer contracts)
Not tracking error budget, making it meaningless

Follow-up Questions

Interviewers often ask these as follow-up questions

How do you handle situations when you're close to exhausting your error budget?
What's the difference between availability and reliability?
How do you choose appropriate SLO targets?

Also worth your time on this topic

Checklist

SLOs, SLIs, and Error Budgets: A Practical Implementation Guide

A step-by-step checklist for defining service level objectives, picking the right service level indicators, and using error budgets to make better decisions about reliability vs. feature velocity.

45-90 minutes

Interview

SLO vs SLI vs SLA Differences

Your team just launched a new API service. Your manager asks you to set up SLOs for it. Can you walk me through what SLOs, SLIs, and SLAs are, and how they relate to each other?

junior

Article

SLOs, SLIs, and Error Budgets: A Practical Implementation Guide

Your service went down at 2 AM and nobody could agree on whether it was "bad enough" to page someone. SLOs, SLIs, and error budgets fix that. Here is how to define, measure, and act on them with real Prometheus queries and alerting rules.

SLI, SLO, and SLA Definitions

More SRE interview questions

Also worth your time on this topic

SLOs, SLIs, and Error Budgets: A Practical Implementation Guide

SLO vs SLI vs SLA Differences

SLOs, SLIs, and Error Budgets: A Practical Implementation Guide