← Insights

2026-06-01 · 7 min

MTTR is not a wall number

MTTR is not a wall number

Mean time to restore is easy to display and hard to trust. Most organizations treat it as a dashboard ornament — a green number that satisfies a quarterly review while incidents repeat with familiar root causes.

Why MTTR fails on the wall

MTTR collapses when three conditions hold:

  • Closure without recovery. Tickets marked resolved when a workaround ships, not when the system is stable.
  • No shared definition. Platform counts time-to-mitigate; product counts time-to-fix; leadership counts time-to-explain.
  • Root cause orphaned. Postmortems filed, action items opened, and nothing reaches the roadmap because nobody owns the constraint.

The number moves. Production does not.

Making MTTR a process

MTTR becomes useful when it is tied to a runbook, not a chart:

  1. Define restore — what "back to normal" means for each service tier
  2. Instrument handoffs — detection, triage, mitigation, fix, verification — with timestamps at each
  3. Close the loop — every incident above severity X produces an ADR or runbook update within two weeks
  4. Report trend — board-ready: median MTTR by tier, quarter over quarter, with one sentence of context per shift

The DORA connection

MTTR is one quarter of the DORA loop. Improving it in isolation — without deployment frequency, lead time, or change failure rate — usually means you are optimizing the wrong constraint.

Start with baseline. Name the bottleneck. Fix with evidence. Verify the numbers moved.

That is engineering delivery work — not dashboard decoration.

If MTTR is a wall number at your organization, Engineering Delivery & Reliability is where we start.

Bring your last three incident reports

Bring your last three incident reports and current MTTR definitions.

View practice