Load Balancing — System Design Fundamentals | CertQnA

A load balancer is the front door of every scalable web system. It does two jobs: spread traffic across many servers, and route around failures. Get this layer right and most other concerns become tractable.

Where Load Balancers Live

[ Users ]
    │
    ▼
[ DNS / Global LB ]   ← geographic + region failover
    │
    ▼
[ Regional / L7 LB ]   ← TLS terminate, path routing
    │
    ▼
[ Internal L4 LB ]     ← service-to-service
    │
    ▼
[ App instances ]

Real systems often have two or three layers. Each layer has different requirements.

L4 vs L7

	L4 (Transport)	L7 (Application)
Operates on	TCP / UDP packets	HTTP requests
Decisions based on	IP, port, connection state	Path, host, header, cookie, body
TLS	Pass-through (encrypted)	Terminates, can re-encrypt
Speed	Very fast, low CPU	Slower, much more flexible
Examples	AWS NLB, GCP Network LB, IPVS, HAProxy TCP mode	AWS ALB, GCP HTTP(S) LB, NGINX, Envoy, Traefik

Pick L7 when you need routing intelligence: /api goes one place, /static another; canary routing by header; per-path rate limits. Pick L4 for raw throughput, non-HTTP protocols, or when TLS must pass through unmodified.

Algorithms

Algorithm	How it picks	When to use
Round-robin	Each backend in turn	Uniform, stateless requests
Weighted round-robin	Heavier backends get more	Heterogeneous instance sizes
Least connections	Backend with fewest active connections	Variable request durations (LLM, websockets)
Least response time	Lowest measured latency	Backends with mixed health
Hash (consistent)	Deterministic by key (IP, user)	Cache affinity, session affinity
Random / power-of-two-choices	Pick 2 random, take less loaded	Surprisingly good default at scale

"Power of two choices" is the unsung hero — it gets nearly the benefit of least-connections with much less coordination cost. Many modern proxies (Envoy, Linkerd) default to it.

Health Checks

Without health checks, the load balancer happily sends traffic to dead servers. Two flavours:

Active — periodically poll /healthz; remove unhealthy backends.
Passive — observe real request results; eject backends that fail too often.

Get the health check right or it becomes the outage. Common bugs:

Too shallow — checks the process is running but not that DB connectivity works → traffic flows to broken servers.
Too deep — health check itself depends on a downstream that flakes → cascading failure removes all backends at once.
No staggering — all backends fail health checks simultaneously when a shared dependency hiccups.

A common pattern: shallow check for liveness, deep check for readiness, with separate endpoints.

Sticky Sessions

Sticky sessions (session affinity) bind a user to one backend instance, usually via cookie or hash. Useful for:

In-memory session state.
Long-lived connections (WebSocket, SSE, gRPC streams).
Cache locality.

The cost: when that backend dies, the user's session goes with it; rolling deploys interrupt users; horizontal scaling becomes uneven.

The right answer for state is usually: push it down to a shared store (Redis, database) and keep your app servers stateless. Use stickiness only when you cannot avoid it (WebSockets) and design for graceful failover.

TLS Termination

L7 load balancers usually terminate TLS — the LB owns the cert, decrypts, then talks plain HTTP (or re-encrypts) to backends. Benefits: cert management in one place; CPU offloaded; ability to inspect/route on path.

For zero-trust or regulated environments, prefer pass-through TLS (NLB) or re-encrypt to backends so traffic is never plaintext on the wire.

Global Load Balancing

Once you serve users in multiple regions, you need DNS- or anycast-based routing on top of regional LBs:

DNS-based (latency / geo) — Route 53, Cloudflare, Google Cloud DNS return the nearest healthy region. Cheap, but DNS TTLs delay failover (clients often cache for minutes).
Anycast — same IP advertised from many regions; BGP routes users to the nearest. Fast failover; available via Cloudflare, AWS Global Accelerator, GCP Premium Tier.

Failure Modes to Anticipate

Connection draining — gracefully finish in-flight requests on a backend before removing it during deploy.
Slow start — newly added instances ramp up gradually instead of getting full load instantly.
Outlier ejection — automatically pull a backend that returns errors above a threshold.
Capacity collapse — when one of N backends dies, the remaining N−1 absorb the extra load. Always plan capacity assuming one zone is down.

Service Mesh: LB Per Service

In a microservices world, you don't have one load balancer; you have one per service-to-service edge. Service meshes (Istio, Linkerd, Consul Connect, Cloud Map + App Mesh) push L7 LB into a sidecar next to each pod. Benefits: uniform mTLS, retries, circuit breakers, observability across all services.

Cert Mapping

Cert	Load balancing scope
AWS SAA-C03	ALB vs NLB vs GLB; target groups; cross-zone; sticky sessions
Azure AZ-104 / AZ-305	Azure Load Balancer (L4), Application Gateway (L7), Front Door (global)
GCP ACE / PCA	Global HTTP(S) LB, Network LB, regional LBs, anycast IP
CKA / CKAD	Service types (ClusterIP, NodePort, LoadBalancer), Ingress controllers

Default Architecture

Global DNS / anycast layer routes users to the nearest healthy region.
Regional L7 load balancer terminates TLS and routes by path/host.
Stateless app instances behind the L7, scaled horizontally with health checks.
Internal L4 or service mesh for service-to-service.

This is the spine of nearly every modern web architecture. The next lesson covers what those app instances should put in front of their data: caching.