"How do you know the platform is working?" is the question every platform leader is asked — by their VP, their board, their developers. The answer is harder than it looks because the wrong metrics are easy to game and the right ones are slow to move. This lesson surveys what to measure and how to operate the platform as a product.
The Output / Outcome Distinction
Outputs are what the platform team did: features shipped, services migrated, plugins installed. Outcomes are what changed for the consumers: lead time fell, deploys are safer, developers spend less time on yak-shaving. Funders and your boss care about outcomes. The platform's value is its effect, not its activity.
DORA: The Four Key Metrics
From Google's State of DevOps research:
| Metric | What it captures | Elite range |
|---|---|---|
| Lead time for changes | Commit to production | < 1 hour |
| Deployment frequency | How often production receives changes | On-demand / multiple per day |
| Change failure rate | % of deploys causing incidents | 0-15% |
| Mean time to restore (MTTR) | How long to recover from incidents | < 1 hour |
Most platforms move lead time and deploy frequency first (CI/CD, GitOps). Change failure rate and MTTR move later, as observability, progressive delivery, and incident tooling mature.
Measure per-team and per-service, not company-wide. Averages hide the variation that platform investment targets.
SPACE: The Bigger Picture
The SPACE framework (Forsgren, Storey, et al., 2021) widens the lens to five dimensions:
- Satisfaction and well-being — survey-based; developer NPS
- Performance — DORA, plus business outcomes
- Activity — commits, PRs, code reviews (use cautiously)
- Communication and collaboration — review turnaround, knowledge sharing
- Efficiency and flow — interruptions, context switches, time in flow
SPACE is opinionated about not reducing to a single number, and about always pairing a quantitative metric with a qualitative one. For platform teams the most actionable subset is satisfaction + DORA performance + a small efficiency proxy.
Platform-Specific KPIs
| KPI | Why it matters |
|---|---|
| Adoption rate (% of services on the paved path) | Voluntary adoption is the truest signal of value |
| Time to first deploy for a new service | How long from "I want a new service" to "it serves production traffic" |
| Self-service ratio (requests handled without a human) | Capacity that scales without hiring |
| Developer NPS for the platform | Direct customer feedback |
| Platform incident impact | How much developer time the platform's own outages cost |
| Cost per service | Infrastructure efficiency the platform delivers |
Pick four or five. Publish them. Move them.
The Developer NPS Question
The single most informative question: "How likely are you to recommend the platform to a colleague?" 0-10, every quarter, every developer. Net Promoter Score = % promoters (9-10) minus % detractors (0-6).
Pair with an open-ended follow-up: "What is the single biggest thing slowing you down?" Categorise the responses; the top three categories become the next quarter's priorities.
Running the Platform as a Product
Outputs from the consumer's point of view are features. The team that ships features needs the structures any product team has:
- Product manager — internal-facing, but the role is the same. Discovery, prioritisation, roadmap, stakeholder management.
- Roadmap — public to internal teams. Quarterly themes, monthly milestones.
- Backlog — visible. RICE / ICE / WSJF works as well for internal as external products.
- Sprint reviews / demos — every two weeks. Invite consumers. Show what shipped.
- Office hours — weekly. Drop-in, no agenda. The single best discovery tool you have.
- Marketing — internal blog posts, lunch-and-learns, demos at all-hands. Adoption requires visibility.
- Onboarding — a one-page "first 30 minutes on the platform" guide that every new hire follows.
The Sponsor Question
Every successful platform has executive sponsorship — usually the CTO or a VP of Engineering. The sponsor:
- Funds the team beyond the first year of slow ROI
- Resolves cross-team conflicts (when team X wants to off-road)
- Signals organisational priority (which lubricates adoption)
Without a sponsor the platform is a side project on borrowed time. Find one before the first hire.
Failure Modes
| Failure mode | Symptom | Fix |
|---|---|---|
| Ivory tower | Platform built in isolation, low adoption | Embed engineers with consuming teams; office hours; co-build |
| Over-abstraction | Heavy DSL, leaky abstractions, hard to debug | Lean toward exposing primitives with sensible defaults |
| Mandated adoption | Teams comply on paper, route around in practice | Make the path easier; let adoption be voluntary |
| No marketing | Teams don't know what's available | Internal demos, changelog, public roadmap, evangelist role |
| Renamed ops | Same ticket queue, new badge | Rebuild the team identity around product ownership; hire PMs |
| Frozen platform | Shipped v1, then nothing | Ongoing investment is the only investment that matters |
The 12-Month Story
A realistic platform timeline:
- Q1: Pick the TVP; hire 2-3 platform engineers and one PM; deliver one golden path
- Q2: Onboard early adopters (2-3 friendly teams); measure baseline DORA per service; iterate
- Q3: Expand to majority of teams; introduce scorecards (informational, not gated); ship portal v1
- Q4: Hit a critical adoption threshold (~50%+ of services); first measurable lift on lead time and MTTR; budget conversation for year 2
If you have not shipped a meaningful capability by Q2 or shown any adoption signal by Q3, course-correct hard. Platform success curves are slow but not silent.
Where to Go Next
- The CNCF Cloud Native Platform Associate exam (formalised in 2024)
- The Platform Engineering book (Camille Fournier, 2024)
- Team Topologies by Skelton & Pais (foundational)
- The platformengineering.org community
- The CertQnA DevOps & SRE and Kubernetes Basics courses to round out the foundation
Platform engineering done well is one of the highest-leverage activities in modern software engineering. Done badly it is an expensive way to rename your ops team. The difference is product discipline.