Skip to content
6 min read·Lesson 7 of 10

Blue/Green, Canary, and Rolling Deployments

Pick the right deployment strategy for your workload. Compare blue/green, canary, rolling, and feature-flag rollouts with concrete examples.

Replacing every running instance with the new version at once is the highest-risk deploy you can do — a bug hits 100% of users immediately. Modern strategies trade a little complexity for much smaller blast radius.

Rolling Update

Replace instances a few at a time. Kubernetes does this by default with Deployments:

spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2          # up to 2 extra new pods during rollout
      maxUnavailable: 1    # at most 1 missing pod at a time

How it plays out: scale new ReplicaSet up by 2, scale old down by 1, wait for new to be Ready, repeat. Total fleet stays within (replicas - maxUnavailable) to (replicas + maxSurge).

✅ Cheap — no extra environment❌ Mixed-version traffic during rollout
✅ Default in K8s and most PaaS❌ Rollback is another rolling update — slow

Blue/Green

Stand up a complete second environment ("green") with the new version. The current "blue" still serves traffic. When green passes smoke tests, switch the load balancer to point at green. Blue stays running for instant rollback.

                    ┌──── blue (v1) ── 100% traffic
   load balancer ───┤
                    └──── green (v2) ── 0% (warm)

After cutover:
                    ┌──── blue (v1) ── 0% (kept for rollback)
   load balancer ───┤
                    └──── green (v2) ── 100% traffic
✅ Instant cutover, instant rollback❌ Doubles infrastructure during deploy
✅ No mixed-version traffic❌ DB schema must work with both versions
✅ Easy to reason about❌ Stateful sessions need care

Implementation options:

  • AWS: target group switch on an ALB; CodeDeploy blue/green
  • K8s: two Deployments + a Service whose selector switches
  • DNS / weighted routing — slow due to TTL caching, avoid

Canary

Send a small slice (1%, 5%, 25%) of traffic to the new version, watch metrics, ramp up if healthy, roll back if not.

load balancer ──┬──── 95% ── stable (v1)
                └──── 5%  ── canary (v2)

Implementation:

  • Service mesh (Istio, Linkerd) for HTTP traffic-splitting
  • Argo Rollouts / Flagger on Kubernetes — automated analysis & promotion
  • AWS App Mesh, ALB weighted target groups
  • Cloudflare & CDN edge routing

Argo Rollouts example:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - setWeight: 25
        - pause: { duration: 10m }
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100
      analysis:
        templates:
          - templateName: success-rate

Hooked to Prometheus, the analysis template auto-aborts the rollout if error rate or latency degrades.

✅ Catch issues with minimal user impact❌ Most operationally complex strategy
✅ Works with rich metrics-driven gating❌ Mixed-version still exists; APIs must be compatible

Shadow / Dark Launch

Send a copy of production traffic to the new version without serving its responses to users. Compare results offline, watch for performance regressions.

Useful for: replacing a critical service, large refactors, or testing performance under real load. Requires care for non-idempotent calls (don't double-charge a credit card).

Feature Flags: Deploy > Release

Deployment strategies move code safely. Feature flags move features safely.

if (flags.isEnabled('new-pricing-engine', { userId, plan })) {
  return computeNewPrice(...);
}
return computeLegacyPrice(...);

You ship code to 100% of users with the flag off. When you're ready, flip the flag for 1%, 10%, 50%, 100% — same staged rollout idea, but for the user-visible change. Crucially, rollback is a config change, not a deploy.

Feature flag platforms: LaunchDarkly, Flagsmith, Unleash, ConfigCat, Statsig, Split.io, or a homegrown DB-backed system.

Database Migrations

The single hardest part of progressive delivery. The new app version and the old one must both run against the same database during the rollout. Patterns:

  • Expand-then-contract: add the new column nullable, deploy code that writes both, backfill, deploy code that reads new only, drop the old.
  • Never break backward compatibility in a single deploy. Split changes across releases.
  • Run migrations as a separate pipeline step, not inside the app boot — or use a Kubernetes Job to migrate before rolling out new app pods.

Choosing

NeedStrategy
Default for stateless apps in K8sRolling update
Critical service with instant rollback needBlue/green
Large user base, mature observabilityCanary + automated analysis
Risky business logic changeFeature flag, slow rollout
Validating performance / behaviourShadow / dark launch

What All Strategies Need

  1. Health checks / readiness probes — the platform must know when an instance is ready
  2. Backward-compatible APIs — old and new clients coexist
  3. Backward-compatible DB schema — see expand-then-contract
  4. Strong observability — error rate, latency, business KPIs
  5. Automated rollback — and proven by drills

Without these, the fanciest deployment strategy is just a slower way to break things.

Key Takeaways

  • Rolling: replace pods/instances a few at a time — the default in Kubernetes.
  • Blue/green: run two full environments and switch traffic at the load balancer.
  • Canary: route a small percentage of traffic to the new version and ramp up.
  • Feature flags decouple deploy from release — ship code dark, flip the switch later.
  • Choose by blast radius, rollback speed, and traffic-shaping needs.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →