MTTR is not a wall number
Mean time to restore is easy to display and hard to trust. Most organizations treat it as a dashboard ornament — a green number that satisfies a quarterly review while incidents repeat with familiar root causes.
Why MTTR fails on the wall
MTTR collapses when three conditions hold:
- Closure without recovery. Tickets marked resolved when a workaround ships, not when the system is stable.
- No shared definition. Platform counts time-to-mitigate; product counts time-to-fix; leadership counts time-to-explain.
- Root cause orphaned. Postmortems filed, action items opened, and nothing reaches the roadmap because nobody owns the constraint.
The number moves. Production does not.
Making MTTR a process
MTTR becomes useful when it is tied to a runbook, not a chart:
- Define restore — what "back to normal" means for each service tier
- Instrument handoffs — detection, triage, mitigation, fix, verification — with timestamps at each
- Close the loop — every incident above severity X produces an ADR or runbook update within two weeks
- Report trend — board-ready: median MTTR by tier, quarter over quarter, with one sentence of context per shift
The DORA connection
MTTR is one quarter of the DORA loop. Improving it in isolation — without deployment frequency, lead time, or change failure rate — usually means you are optimizing the wrong constraint.
Start with baseline. Name the bottleneck. Fix with evidence. Verify the numbers moved.
That is engineering delivery work — not dashboard decoration.
If MTTR is a wall number at your organization, Engineering Delivery & Reliability is where we start.
Bring your last three incident reports
Bring your last three incident reports and current MTTR definitions.