By Eva Winterschön in Engineering — 18 Jun 2026

Inference Sector KPI SLO

Observability & Telemetry :: SLO Definition Additions

Why should anyone care about KPIs and SLOs?

Generally, unless one gets paid to deal with OBS/Tel and SLA conformance.. ehh. Except that the modern world operates within the bounds of a diverse and often not-fascinating array of performance indicators and services' operating metrics.

So.. Here are two for today.

KPI == Key Performance Indicator
SLO == Service Level Objective

Minor Backstory - Today

While aggregating directory content from several workstations into a unified NFS remote mount,(can't simply have one workstation, right? right), another encounter with SLO defs appears upon the terminal.

This time it's for LLM Inference customers and their typical requirements. Perhaps our standards repository would be helpful as an adjunct resource:

https://github.com/yukon-systems/YukonSYS-Standard-Definitions

Inference Service KPIs by Sector

An extension of distributed cloud services architecture with heavy focus on baremetal with VMs and Containers throughout. Many ways to solve the problems, scale the infra, etc.

SLO Additions to the Classifier Repo

This file is a compact companion to the observability + telemetry glossary.
Each threshold is a starting point and should be tightened to real-world workload SLOs after baselining standard operations.

Sector	High-signal KPIs	Default alert style
LLM Inference (GP-GPU Service Infra)	TTFT, ITL/TPOT, E2E, QTS	SLO-derived or baseline-relative; baseline-relative
LLM Inference (API Service Infra)	P95 LAT, ERR, 429R, UST	SLO-derived
LLM Inference (Network Hardware + Protocol Infra)	OWD, PDV, LOSS, ECN	baseline-relative; budget-relative
LLM Inference (Prompt Caching, Compute + Re-Compute)	PCHR, CTR, TTFTR, RCR	absolute for cache-eligible traffic
LLM Inference (Prompt Caching, Storage Infra)	CHR, CLAT, COCC, EVR	absolute; absolute for cache-eligible traffic
LLM Inference (Prompt Caching API + Load-Balancers)	RCHR, LQ, URT, RTR	absolute
LLM Training (bulk initial datasets)	ITPS, DHR, UDR, CONT	absolute; capacity-envelope
LLM Training (pre-training MoE)	MFU, STEP, A2AS, EIR	absolute starting point; baseline-relative
LLM Training (post-training MoE)	TPS, MFU, RACC, RMAR	absolute starting point; baseline-relative
HPC - High Frequency Trading - VM Clusters	RDY, CSTP, NUMA MISS, T2D	absolute starting point
HPC - High Frequency Trading - Baremetal	W2W, JITR, CPM, MPKI	baseline-relative; budget-relative
HPC - High Frequency Trading - CDN	CHR, OOR, TTFB, OTTFB	absolute
HPC - High Frequency Trading - Low-Latency Exec	OWD, JIT, FLAT, OLAT	baseline-relative; budget-relative
HPC - Dark Fiber Regional Network + Infra	AVAIL, OWD, Pre-FEC BER, OSNRM	SLO-derived; budget-relative
HPC - Quantitative Research + Machine Learning	BTT, FLAT, TSS, FFL	absolute starting point; baseline-relative
HPC - Big-Data & Multivariate Pattern Analysis	JDUR, SKR, SPILL, CLAG	SLA-derived; absolute
SLA + SLO Monitoring, Telemetry, Alerting Infra	SLI-AV, SLI-LAT, EBR, UP	SLO-derived

Global Policy Adjustments

baseline_window:
- 7d median or p50/p95 baseline unless otherwise stated
slo_policy:
- page: error budget burn rate > 14.4 over 1h and 5m, or > 6 over 6h and 30m
- ticket: error budget burn rate > 1 over 3d and 6h
capacity_policy:
- warn: sustained > 80% of validated steady-state capacity
- critical: sustained > 90% of validated steady-state capacity or latency inflects above SLO
regression_policy:
- warn: > 1.10x to 1.25x baseline depending on sector sensitivity
- critical: > 1.20x to 1.50x baseline depending on sector sensitivity

Reference Considerations

Thresholds are starting operational thresholds, not universal laws.
Where a standards or vendor reference provides an acceptable range, that range is used.
Where no universal value exists, thresholds are either SLO-derived or baseline-relative (typically against a 7d median or a validated capacity envelope).
For latency-sensitive sectors, alert on percentile regressions and budget exhaustion, not averages.

Inference Sector KPI SLO

Why should anyone care about KPIs and SLOs?

So.. Here are two for today.

Minor Backstory - Today

Inference Service KPIs by Sector

SLO Additions to the Classifier Repo

Global Policy Adjustments

Reference Considerations

Summer 💌 for Workstations?

A Day with F2FS @ The Pool

Why should anyone care about KPIs and SLOs?

So.. Here are two for today.

Minor Backstory - Today

Inference Service KPIs by Sector

SLO Additions to the Classifier Repo

Global Policy Adjustments

Reference Considerations

Summer 💌 for Workstations?

A Day with F2FS @ The Pool

You might also like...