Measuring Platform Success and Running it as a Product

"How do you know the platform is working?" is the question every platform leader is asked — by their VP, their board, their developers. The answer is harder than it looks because the wrong metrics are easy to game and the right ones are slow to move. This lesson surveys what to measure and how to operate the platform as a product.

The Output / Outcome Distinction

Outputs are what the platform team did: features shipped, services migrated, plugins installed. Outcomes are what changed for the consumers: lead time fell, deploys are safer, developers spend less time on yak-shaving. Funders and your boss care about outcomes. The platform's value is its effect, not its activity.

DORA: The Four Key Metrics

From Google's State of DevOps research:

Metric	What it captures	Elite range
Lead time for changes	Commit to production	< 1 hour
Deployment frequency	How often production receives changes	On-demand / multiple per day
Change failure rate	% of deploys causing incidents	0-15%
Mean time to restore (MTTR)	How long to recover from incidents	< 1 hour

Most platforms move lead time and deploy frequency first (CI/CD, GitOps). Change failure rate and MTTR move later, as observability, progressive delivery, and incident tooling mature.

Measure per-team and per-service, not company-wide. Averages hide the variation that platform investment targets.

SPACE: The Bigger Picture

The SPACE framework (Forsgren, Storey, et al., 2021) widens the lens to five dimensions:

Satisfaction and well-being — survey-based; developer NPS
Performance — DORA, plus business outcomes
Activity — commits, PRs, code reviews (use cautiously)
Communication and collaboration — review turnaround, knowledge sharing
Efficiency and flow — interruptions, context switches, time in flow

SPACE is opinionated about not reducing to a single number, and about always pairing a quantitative metric with a qualitative one. For platform teams the most actionable subset is satisfaction + DORA performance + a small efficiency proxy.

Platform-Specific KPIs

KPI	Why it matters
Adoption rate (% of services on the paved path)	Voluntary adoption is the truest signal of value
Time to first deploy for a new service	How long from "I want a new service" to "it serves production traffic"
Self-service ratio (requests handled without a human)	Capacity that scales without hiring
Developer NPS for the platform	Direct customer feedback
Platform incident impact	How much developer time the platform's own outages cost
Cost per service	Infrastructure efficiency the platform delivers

Pick four or five. Publish them. Move them.

The Developer NPS Question

The single most informative question: "How likely are you to recommend the platform to a colleague?" 0-10, every quarter, every developer. Net Promoter Score = % promoters (9-10) minus % detractors (0-6).

Pair with an open-ended follow-up: "What is the single biggest thing slowing you down?" Categorise the responses; the top three categories become the next quarter's priorities.

Running the Platform as a Product

Outputs from the consumer's point of view are features. The team that ships features needs the structures any product team has:

Product manager — internal-facing, but the role is the same. Discovery, prioritisation, roadmap, stakeholder management.
Roadmap — public to internal teams. Quarterly themes, monthly milestones.
Backlog — visible. RICE / ICE / WSJF works as well for internal as external products.
Sprint reviews / demos — every two weeks. Invite consumers. Show what shipped.
Office hours — weekly. Drop-in, no agenda. The single best discovery tool you have.
Marketing — internal blog posts, lunch-and-learns, demos at all-hands. Adoption requires visibility.
Onboarding — a one-page "first 30 minutes on the platform" guide that every new hire follows.

The Sponsor Question

Every successful platform has executive sponsorship — usually the CTO or a VP of Engineering. The sponsor:

Funds the team beyond the first year of slow ROI
Resolves cross-team conflicts (when team X wants to off-road)
Signals organisational priority (which lubricates adoption)

Without a sponsor the platform is a side project on borrowed time. Find one before the first hire.

Failure Modes

Failure mode	Symptom	Fix
Ivory tower	Platform built in isolation, low adoption	Embed engineers with consuming teams; office hours; co-build
Over-abstraction	Heavy DSL, leaky abstractions, hard to debug	Lean toward exposing primitives with sensible defaults
Mandated adoption	Teams comply on paper, route around in practice	Make the path easier; let adoption be voluntary
No marketing	Teams don't know what's available	Internal demos, changelog, public roadmap, evangelist role
Renamed ops	Same ticket queue, new badge	Rebuild the team identity around product ownership; hire PMs
Frozen platform	Shipped v1, then nothing	Ongoing investment is the only investment that matters

The 12-Month Story

A realistic platform timeline:

Q1: Pick the TVP; hire 2-3 platform engineers and one PM; deliver one golden path
Q2: Onboard early adopters (2-3 friendly teams); measure baseline DORA per service; iterate
Q3: Expand to majority of teams; introduce scorecards (informational, not gated); ship portal v1
Q4: Hit a critical adoption threshold (~50%+ of services); first measurable lift on lead time and MTTR; budget conversation for year 2

If you have not shipped a meaningful capability by Q2 or shown any adoption signal by Q3, course-correct hard. Platform success curves are slow but not silent.

Where to Go Next

The CNCF Cloud Native Platform Associate exam (formalised in 2024)
The Platform Engineering book (Camille Fournier, 2024)
Team Topologies by Skelton & Pais (foundational)
The platformengineering.org community
The CertQnA DevOps & SRE and Kubernetes Basics courses to round out the foundation

Platform engineering done well is one of the highest-leverage activities in modern software engineering. Done badly it is an expensive way to rename your ops team. The difference is product discipline.