Developer Experience, Scorecards, and Standards — Platform Engineering and Internal Developer Platforms | CertQnA

Developer experience (DevEx) is the felt quality of doing your job as an engineer. It is the difference between "I made the change in 20 minutes" and "I spent two days on yak-shaving." Platform engineering, done well, is the discipline of measuring and improving DevEx at scale.

The DevEx Framework

Nicole Forsgren, Margaret-Anne Storey, and Abi Noda's 2023 paper proposed three measurable dimensions:

Dimension	What it captures	Improvable by
Flow state	Ability to focus on meaningful work without interruption	Reducing meetings, async culture, blocking-time guarantees
Feedback loops	Speed of getting signal — tests, builds, deploys, reviews	Faster CI, preview environments, fast local dev
Cognitive load	Mental effort required to do the task	Better docs, paved paths, automation

Platform teams have leverage on all three but disproportionately on feedback loops and cognitive load. Flow state is largely an org-design question (which platform teams influence but do not own alone).

Scorecards: Encoding Standards as Data

A scorecard is a set of checks evaluated against every service. Each check returns pass / fail / not-applicable; scores roll up to a maturity level. Common categories:

Category	Example checks
Ownership	Has CODEOWNERS; team registered in catalogue; on-call rotation defined
Documentation	README present; TechDocs builds; runbook linked
Observability	Metrics endpoint exposed; dashboard registered; logs flowing
Reliability	SLO defined; healthcheck implemented; deploys via GitOps
Security	Image scanned; SBOM present; no critical CVEs; secrets not in repo
Quality	Test coverage threshold; lint clean; latest framework version

Implementation

Backstage has the tech-insights plugin and a scorecards UI; Port and Cortex have scorecards as a first-class feature. The check logic is typically a small script (or API call) per criterion that returns true/false. Evaluation runs nightly, surfaced in the portal.

Maturity levels

Most teams use four:

Bronze: service exists, ownership clear
Silver: production-ready basics (observability, healthchecks, on-call)
Gold: production-mature (SLOs, runbooks, chaos-tested)
Platinum: exemplary (autoscaled, multi-region, regularly load-tested)

The Politics: Carrot, Not Stick

Scorecards fail when they become a stick. "Your service is bronze — fix it by Friday" generates resentment and shortcut behaviour. They succeed when:

The scaffolder template ships services already at silver — most checks pass on day one
Each red check links to a fix that takes minutes, not days
Scores are visible to the team's leadership but not weaponised as performance criteria
Promotion to higher tiers unlocks things — better SLAs, eligible for production traffic, can apply for budget increases

The model is "make the right thing the easy thing" — the standard is encoded in the template; failing a scorecard means the team diverged from the template, which is usually unintentional and quick to fix.

Three Signals to Start With

Resist measuring twenty things. Start with three that drive disproportionate value:

Ownership. Every service has a team. Without this, nothing else functions.
Observability. Logs, metrics, and a dashboard. Without this, you cannot operate.
Incident readiness. On-call rotation, runbook, healthcheck. Without this, you cannot respond.

Get these to 100% before adding more checks.

Standards as Code

The natural evolution of scorecards is to enforce some standards automatically via policy:

OPA / Gatekeeper / Kyverno — admission-time policy on Kubernetes resources
Repo policy — GitHub branch protection, required reviews, required status checks
Pipeline policy — required steps in CI (SAST, dependency scan, signing)
Supply chain — SLSA levels, in-toto attestations, Sigstore signing

The platform team's job is to author these policies, document them, and provide bypasses for exceptional cases.

"Shift Left" Without Burying Teams

The 2010s "shift left" mantra moved security and compliance earlier in the lifecycle. Done badly, it dumps those concerns on already-overloaded application teams. Done well, the platform automates so much of the standard that compliance is the byproduct of using the paved path:

The Dockerfile in the template uses a distroless base — no CVE conversation needed
The Helm chart in the template requires resource limits — no namespace-quota incident
The CI in the template runs SBOM and signing — supply-chain compliance for free
The Backstage entity ties to the security scorecard — exceptions are visible, not buried

Listening Loops

The portal and scorecards tell you what's true; the developer survey tells you what's painful. Run both:

Quarterly survey, < 15 questions, with one open-ended "what slows you down most?" item
Adoption metrics from the portal (active users, templates used, scorecards green)
DORA metrics from CI/CD (lead time, deploy frequency, change-failure rate, MTTR)
Office hours and a public roadmap so teams see their feedback shape the platform

DevEx is real engineering work — discoverable, measurable, improvable. The platform that takes it seriously is the platform that wins adoption.