Blue/Green, Canary, and Rolling Deployments — CI/CD Pipelines | CertQnA

Replacing every running instance with the new version at once is the highest-risk deploy you can do — a bug hits 100% of users immediately. Modern strategies trade a little complexity for much smaller blast radius.

Rolling Update

Replace instances a few at a time. Kubernetes does this by default with Deployments:

spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2          # up to 2 extra new pods during rollout
      maxUnavailable: 1    # at most 1 missing pod at a time

How it plays out: scale new ReplicaSet up by 2, scale old down by 1, wait for new to be Ready, repeat. Total fleet stays within (replicas - maxUnavailable) to (replicas + maxSurge).

✅ Cheap — no extra environment	❌ Mixed-version traffic during rollout
✅ Default in K8s and most PaaS	❌ Rollback is another rolling update — slow

Blue/Green

Stand up a complete second environment ("green") with the new version. The current "blue" still serves traffic. When green passes smoke tests, switch the load balancer to point at green. Blue stays running for instant rollback.

                    ┌──── blue (v1) ── 100% traffic
   load balancer ───┤
                    └──── green (v2) ── 0% (warm)

After cutover:
                    ┌──── blue (v1) ── 0% (kept for rollback)
   load balancer ───┤
                    └──── green (v2) ── 100% traffic

✅ Instant cutover, instant rollback	❌ Doubles infrastructure during deploy
✅ No mixed-version traffic	❌ DB schema must work with both versions
✅ Easy to reason about	❌ Stateful sessions need care

Implementation options:

AWS: target group switch on an ALB; CodeDeploy blue/green
K8s: two Deployments + a Service whose selector switches
DNS / weighted routing — slow due to TTL caching, avoid

Canary

Send a small slice (1%, 5%, 25%) of traffic to the new version, watch metrics, ramp up if healthy, roll back if not.

load balancer ──┬──── 95% ── stable (v1)
                └──── 5%  ── canary (v2)

Implementation:

Service mesh (Istio, Linkerd) for HTTP traffic-splitting
Argo Rollouts / Flagger on Kubernetes — automated analysis & promotion
AWS App Mesh, ALB weighted target groups
Cloudflare & CDN edge routing

Argo Rollouts example:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - setWeight: 25
        - pause: { duration: 10m }
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100
      analysis:
        templates:
          - templateName: success-rate

Hooked to Prometheus, the analysis template auto-aborts the rollout if error rate or latency degrades.

✅ Catch issues with minimal user impact	❌ Most operationally complex strategy
✅ Works with rich metrics-driven gating	❌ Mixed-version still exists; APIs must be compatible

Shadow / Dark Launch

Send a copy of production traffic to the new version without serving its responses to users. Compare results offline, watch for performance regressions.

Useful for: replacing a critical service, large refactors, or testing performance under real load. Requires care for non-idempotent calls (don't double-charge a credit card).

Feature Flags: Deploy > Release

Deployment strategies move code safely. Feature flags move features safely.

if (flags.isEnabled('new-pricing-engine', { userId, plan })) {
  return computeNewPrice(...);
}
return computeLegacyPrice(...);

You ship code to 100% of users with the flag off. When you're ready, flip the flag for 1%, 10%, 50%, 100% — same staged rollout idea, but for the user-visible change. Crucially, rollback is a config change, not a deploy.

Feature flag platforms: LaunchDarkly, Flagsmith, Unleash, ConfigCat, Statsig, Split.io, or a homegrown DB-backed system.

Database Migrations

The single hardest part of progressive delivery. The new app version and the old one must both run against the same database during the rollout. Patterns:

Expand-then-contract: add the new column nullable, deploy code that writes both, backfill, deploy code that reads new only, drop the old.
Never break backward compatibility in a single deploy. Split changes across releases.
Run migrations as a separate pipeline step, not inside the app boot — or use a Kubernetes Job to migrate before rolling out new app pods.

Choosing

Need	Strategy
Default for stateless apps in K8s	Rolling update
Critical service with instant rollback need	Blue/green
Large user base, mature observability	Canary + automated analysis
Risky business logic change	Feature flag, slow rollout
Validating performance / behaviour	Shadow / dark launch

What All Strategies Need

Health checks / readiness probes — the platform must know when an instance is ready
Backward-compatible APIs — old and new clients coexist
Backward-compatible DB schema — see expand-then-contract
Strong observability — error rate, latency, business KPIs
Automated rollback — and proven by drills

Without these, the fanciest deployment strategy is just a slower way to break things.