Developer experience (DevEx) is the felt quality of doing your job as an engineer. It is the difference between "I made the change in 20 minutes" and "I spent two days on yak-shaving." Platform engineering, done well, is the discipline of measuring and improving DevEx at scale.
The DevEx Framework
Nicole Forsgren, Margaret-Anne Storey, and Abi Noda's 2023 paper proposed three measurable dimensions:
| Dimension | What it captures | Improvable by |
|---|---|---|
| Flow state | Ability to focus on meaningful work without interruption | Reducing meetings, async culture, blocking-time guarantees |
| Feedback loops | Speed of getting signal — tests, builds, deploys, reviews | Faster CI, preview environments, fast local dev |
| Cognitive load | Mental effort required to do the task | Better docs, paved paths, automation |
Platform teams have leverage on all three but disproportionately on feedback loops and cognitive load. Flow state is largely an org-design question (which platform teams influence but do not own alone).
Scorecards: Encoding Standards as Data
A scorecard is a set of checks evaluated against every service. Each check returns pass / fail / not-applicable; scores roll up to a maturity level. Common categories:
| Category | Example checks |
|---|---|
| Ownership | Has CODEOWNERS; team registered in catalogue; on-call rotation defined |
| Documentation | README present; TechDocs builds; runbook linked |
| Observability | Metrics endpoint exposed; dashboard registered; logs flowing |
| Reliability | SLO defined; healthcheck implemented; deploys via GitOps |
| Security | Image scanned; SBOM present; no critical CVEs; secrets not in repo |
| Quality | Test coverage threshold; lint clean; latest framework version |
Implementation
Backstage has the tech-insights plugin and a scorecards UI; Port and Cortex have scorecards as a first-class feature. The check logic is typically a small script (or API call) per criterion that returns true/false. Evaluation runs nightly, surfaced in the portal.
Maturity levels
Most teams use four:
- Bronze: service exists, ownership clear
- Silver: production-ready basics (observability, healthchecks, on-call)
- Gold: production-mature (SLOs, runbooks, chaos-tested)
- Platinum: exemplary (autoscaled, multi-region, regularly load-tested)
The Politics: Carrot, Not Stick
Scorecards fail when they become a stick. "Your service is bronze — fix it by Friday" generates resentment and shortcut behaviour. They succeed when:
- The scaffolder template ships services already at silver — most checks pass on day one
- Each red check links to a fix that takes minutes, not days
- Scores are visible to the team's leadership but not weaponised as performance criteria
- Promotion to higher tiers unlocks things — better SLAs, eligible for production traffic, can apply for budget increases
The model is "make the right thing the easy thing" — the standard is encoded in the template; failing a scorecard means the team diverged from the template, which is usually unintentional and quick to fix.
Three Signals to Start With
Resist measuring twenty things. Start with three that drive disproportionate value:
- Ownership. Every service has a team. Without this, nothing else functions.
- Observability. Logs, metrics, and a dashboard. Without this, you cannot operate.
- Incident readiness. On-call rotation, runbook, healthcheck. Without this, you cannot respond.
Get these to 100% before adding more checks.
Standards as Code
The natural evolution of scorecards is to enforce some standards automatically via policy:
- OPA / Gatekeeper / Kyverno — admission-time policy on Kubernetes resources
- Repo policy — GitHub branch protection, required reviews, required status checks
- Pipeline policy — required steps in CI (SAST, dependency scan, signing)
- Supply chain — SLSA levels, in-toto attestations, Sigstore signing
The platform team's job is to author these policies, document them, and provide bypasses for exceptional cases.
"Shift Left" Without Burying Teams
The 2010s "shift left" mantra moved security and compliance earlier in the lifecycle. Done badly, it dumps those concerns on already-overloaded application teams. Done well, the platform automates so much of the standard that compliance is the byproduct of using the paved path:
- The Dockerfile in the template uses a distroless base — no CVE conversation needed
- The Helm chart in the template requires resource limits — no namespace-quota incident
- The CI in the template runs SBOM and signing — supply-chain compliance for free
- The Backstage entity ties to the security scorecard — exceptions are visible, not buried
Listening Loops
The portal and scorecards tell you what's true; the developer survey tells you what's painful. Run both:
- Quarterly survey, < 15 questions, with one open-ended "what slows you down most?" item
- Adoption metrics from the portal (active users, templates used, scorecards green)
- DORA metrics from CI/CD (lead time, deploy frequency, change-failure rate, MTTR)
- Office hours and a public roadmap so teams see their feedback shape the platform
DevEx is real engineering work — discoverable, measurable, improvable. The platform that takes it seriously is the platform that wins adoption.