Skip to content
6 min read·Lesson 7 of 8

Developer Experience, Scorecards, and Standards

Designing for developer experience (DevEx), encoding standards as scorecards, and the politics of "shifting left" without overwhelming teams.

Developer experience (DevEx) is the felt quality of doing your job as an engineer. It is the difference between "I made the change in 20 minutes" and "I spent two days on yak-shaving." Platform engineering, done well, is the discipline of measuring and improving DevEx at scale.

The DevEx Framework

Nicole Forsgren, Margaret-Anne Storey, and Abi Noda's 2023 paper proposed three measurable dimensions:

DimensionWhat it capturesImprovable by
Flow stateAbility to focus on meaningful work without interruptionReducing meetings, async culture, blocking-time guarantees
Feedback loopsSpeed of getting signal — tests, builds, deploys, reviewsFaster CI, preview environments, fast local dev
Cognitive loadMental effort required to do the taskBetter docs, paved paths, automation

Platform teams have leverage on all three but disproportionately on feedback loops and cognitive load. Flow state is largely an org-design question (which platform teams influence but do not own alone).

Scorecards: Encoding Standards as Data

A scorecard is a set of checks evaluated against every service. Each check returns pass / fail / not-applicable; scores roll up to a maturity level. Common categories:

CategoryExample checks
OwnershipHas CODEOWNERS; team registered in catalogue; on-call rotation defined
DocumentationREADME present; TechDocs builds; runbook linked
ObservabilityMetrics endpoint exposed; dashboard registered; logs flowing
ReliabilitySLO defined; healthcheck implemented; deploys via GitOps
SecurityImage scanned; SBOM present; no critical CVEs; secrets not in repo
QualityTest coverage threshold; lint clean; latest framework version

Implementation

Backstage has the tech-insights plugin and a scorecards UI; Port and Cortex have scorecards as a first-class feature. The check logic is typically a small script (or API call) per criterion that returns true/false. Evaluation runs nightly, surfaced in the portal.

Maturity levels

Most teams use four:

  1. Bronze: service exists, ownership clear
  2. Silver: production-ready basics (observability, healthchecks, on-call)
  3. Gold: production-mature (SLOs, runbooks, chaos-tested)
  4. Platinum: exemplary (autoscaled, multi-region, regularly load-tested)

The Politics: Carrot, Not Stick

Scorecards fail when they become a stick. "Your service is bronze — fix it by Friday" generates resentment and shortcut behaviour. They succeed when:

  • The scaffolder template ships services already at silver — most checks pass on day one
  • Each red check links to a fix that takes minutes, not days
  • Scores are visible to the team's leadership but not weaponised as performance criteria
  • Promotion to higher tiers unlocks things — better SLAs, eligible for production traffic, can apply for budget increases

The model is "make the right thing the easy thing" — the standard is encoded in the template; failing a scorecard means the team diverged from the template, which is usually unintentional and quick to fix.

Three Signals to Start With

Resist measuring twenty things. Start with three that drive disproportionate value:

  1. Ownership. Every service has a team. Without this, nothing else functions.
  2. Observability. Logs, metrics, and a dashboard. Without this, you cannot operate.
  3. Incident readiness. On-call rotation, runbook, healthcheck. Without this, you cannot respond.

Get these to 100% before adding more checks.

Standards as Code

The natural evolution of scorecards is to enforce some standards automatically via policy:

  • OPA / Gatekeeper / Kyverno — admission-time policy on Kubernetes resources
  • Repo policy — GitHub branch protection, required reviews, required status checks
  • Pipeline policy — required steps in CI (SAST, dependency scan, signing)
  • Supply chain — SLSA levels, in-toto attestations, Sigstore signing

The platform team's job is to author these policies, document them, and provide bypasses for exceptional cases.

"Shift Left" Without Burying Teams

The 2010s "shift left" mantra moved security and compliance earlier in the lifecycle. Done badly, it dumps those concerns on already-overloaded application teams. Done well, the platform automates so much of the standard that compliance is the byproduct of using the paved path:

  • The Dockerfile in the template uses a distroless base — no CVE conversation needed
  • The Helm chart in the template requires resource limits — no namespace-quota incident
  • The CI in the template runs SBOM and signing — supply-chain compliance for free
  • The Backstage entity ties to the security scorecard — exceptions are visible, not buried

Listening Loops

The portal and scorecards tell you what's true; the developer survey tells you what's painful. Run both:

  • Quarterly survey, < 15 questions, with one open-ended "what slows you down most?" item
  • Adoption metrics from the portal (active users, templates used, scorecards green)
  • DORA metrics from CI/CD (lead time, deploy frequency, change-failure rate, MTTR)
  • Office hours and a public roadmap so teams see their feedback shape the platform

DevEx is real engineering work — discoverable, measurable, improvable. The platform that takes it seriously is the platform that wins adoption.

Key Takeaways

  • DevEx is a discrete discipline with three dimensions: flow state, feedback loops, cognitive load.
  • Scorecards encode standards as data — discoverable, gradable, automatable.
  • Three signals to start with: ownership, observability, incident readiness.
  • Standards should reward, not punish; pair scorecards with templates that pre-fill them.
  • DORA, SPACE, and the new DevEx framework (Forsgren, Storey, Maddila) are the academic backing.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →