Replacing every running instance with the new version at once is the highest-risk deploy you can do — a bug hits 100% of users immediately. Modern strategies trade a little complexity for much smaller blast radius.
Rolling Update
Replace instances a few at a time. Kubernetes does this by default with Deployments:
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # up to 2 extra new pods during rollout
maxUnavailable: 1 # at most 1 missing pod at a time
How it plays out: scale new ReplicaSet up by 2, scale old down by 1, wait for new to be Ready, repeat. Total fleet stays within (replicas - maxUnavailable) to (replicas + maxSurge).
| ✅ Cheap — no extra environment | ❌ Mixed-version traffic during rollout |
| ✅ Default in K8s and most PaaS | ❌ Rollback is another rolling update — slow |
Blue/Green
Stand up a complete second environment ("green") with the new version. The current "blue" still serves traffic. When green passes smoke tests, switch the load balancer to point at green. Blue stays running for instant rollback.
┌──── blue (v1) ── 100% traffic
load balancer ───┤
└──── green (v2) ── 0% (warm)
After cutover:
┌──── blue (v1) ── 0% (kept for rollback)
load balancer ───┤
└──── green (v2) ── 100% traffic
| ✅ Instant cutover, instant rollback | ❌ Doubles infrastructure during deploy |
| ✅ No mixed-version traffic | ❌ DB schema must work with both versions |
| ✅ Easy to reason about | ❌ Stateful sessions need care |
Implementation options:
- AWS: target group switch on an ALB; CodeDeploy blue/green
- K8s: two Deployments + a Service whose selector switches
- DNS / weighted routing — slow due to TTL caching, avoid
Canary
Send a small slice (1%, 5%, 25%) of traffic to the new version, watch metrics, ramp up if healthy, roll back if not.
load balancer ──┬──── 95% ── stable (v1)
└──── 5% ── canary (v2)
Implementation:
- Service mesh (Istio, Linkerd) for HTTP traffic-splitting
- Argo Rollouts / Flagger on Kubernetes — automated analysis & promotion
- AWS App Mesh, ALB weighted target groups
- Cloudflare & CDN edge routing
Argo Rollouts example:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 5
- pause: { duration: 5m }
- setWeight: 25
- pause: { duration: 10m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
analysis:
templates:
- templateName: success-rate
Hooked to Prometheus, the analysis template auto-aborts the rollout if error rate or latency degrades.
| ✅ Catch issues with minimal user impact | ❌ Most operationally complex strategy |
| ✅ Works with rich metrics-driven gating | ❌ Mixed-version still exists; APIs must be compatible |
Shadow / Dark Launch
Send a copy of production traffic to the new version without serving its responses to users. Compare results offline, watch for performance regressions.
Useful for: replacing a critical service, large refactors, or testing performance under real load. Requires care for non-idempotent calls (don't double-charge a credit card).
Feature Flags: Deploy > Release
Deployment strategies move code safely. Feature flags move features safely.
if (flags.isEnabled('new-pricing-engine', { userId, plan })) {
return computeNewPrice(...);
}
return computeLegacyPrice(...);
You ship code to 100% of users with the flag off. When you're ready, flip the flag for 1%, 10%, 50%, 100% — same staged rollout idea, but for the user-visible change. Crucially, rollback is a config change, not a deploy.
Feature flag platforms: LaunchDarkly, Flagsmith, Unleash, ConfigCat, Statsig, Split.io, or a homegrown DB-backed system.
Database Migrations
The single hardest part of progressive delivery. The new app version and the old one must both run against the same database during the rollout. Patterns:
- Expand-then-contract: add the new column nullable, deploy code that writes both, backfill, deploy code that reads new only, drop the old.
- Never break backward compatibility in a single deploy. Split changes across releases.
- Run migrations as a separate pipeline step, not inside the app boot — or use a Kubernetes Job to migrate before rolling out new app pods.
Choosing
| Need | Strategy |
|---|---|
| Default for stateless apps in K8s | Rolling update |
| Critical service with instant rollback need | Blue/green |
| Large user base, mature observability | Canary + automated analysis |
| Risky business logic change | Feature flag, slow rollout |
| Validating performance / behaviour | Shadow / dark launch |
What All Strategies Need
- Health checks / readiness probes — the platform must know when an instance is ready
- Backward-compatible APIs — old and new clients coexist
- Backward-compatible DB schema — see expand-then-contract
- Strong observability — error rate, latency, business KPIs
- Automated rollback — and proven by drills
Without these, the fanciest deployment strategy is just a slower way to break things.