A load balancer is the front door of every scalable web system. It does two jobs: spread traffic across many servers, and route around failures. Get this layer right and most other concerns become tractable.
Where Load Balancers Live
[ Users ]
│
▼
[ DNS / Global LB ] ← geographic + region failover
│
▼
[ Regional / L7 LB ] ← TLS terminate, path routing
│
▼
[ Internal L4 LB ] ← service-to-service
│
▼
[ App instances ]
Real systems often have two or three layers. Each layer has different requirements.
L4 vs L7
| L4 (Transport) | L7 (Application) | |
|---|---|---|
| Operates on | TCP / UDP packets | HTTP requests |
| Decisions based on | IP, port, connection state | Path, host, header, cookie, body |
| TLS | Pass-through (encrypted) | Terminates, can re-encrypt |
| Speed | Very fast, low CPU | Slower, much more flexible |
| Examples | AWS NLB, GCP Network LB, IPVS, HAProxy TCP mode | AWS ALB, GCP HTTP(S) LB, NGINX, Envoy, Traefik |
Pick L7 when you need routing intelligence: /api goes one place, /static another; canary routing by header; per-path rate limits. Pick L4 for raw throughput, non-HTTP protocols, or when TLS must pass through unmodified.
Algorithms
| Algorithm | How it picks | When to use |
|---|---|---|
| Round-robin | Each backend in turn | Uniform, stateless requests |
| Weighted round-robin | Heavier backends get more | Heterogeneous instance sizes |
| Least connections | Backend with fewest active connections | Variable request durations (LLM, websockets) |
| Least response time | Lowest measured latency | Backends with mixed health |
| Hash (consistent) | Deterministic by key (IP, user) | Cache affinity, session affinity |
| Random / power-of-two-choices | Pick 2 random, take less loaded | Surprisingly good default at scale |
"Power of two choices" is the unsung hero — it gets nearly the benefit of least-connections with much less coordination cost. Many modern proxies (Envoy, Linkerd) default to it.
Health Checks
Without health checks, the load balancer happily sends traffic to dead servers. Two flavours:
- Active — periodically poll
/healthz; remove unhealthy backends. - Passive — observe real request results; eject backends that fail too often.
Get the health check right or it becomes the outage. Common bugs:
- Too shallow — checks the process is running but not that DB connectivity works → traffic flows to broken servers.
- Too deep — health check itself depends on a downstream that flakes → cascading failure removes all backends at once.
- No staggering — all backends fail health checks simultaneously when a shared dependency hiccups.
A common pattern: shallow check for liveness, deep check for readiness, with separate endpoints.
Sticky Sessions
Sticky sessions (session affinity) bind a user to one backend instance, usually via cookie or hash. Useful for:
- In-memory session state.
- Long-lived connections (WebSocket, SSE, gRPC streams).
- Cache locality.
The cost: when that backend dies, the user's session goes with it; rolling deploys interrupt users; horizontal scaling becomes uneven.
The right answer for state is usually: push it down to a shared store (Redis, database) and keep your app servers stateless. Use stickiness only when you cannot avoid it (WebSockets) and design for graceful failover.
TLS Termination
L7 load balancers usually terminate TLS — the LB owns the cert, decrypts, then talks plain HTTP (or re-encrypts) to backends. Benefits: cert management in one place; CPU offloaded; ability to inspect/route on path.
For zero-trust or regulated environments, prefer pass-through TLS (NLB) or re-encrypt to backends so traffic is never plaintext on the wire.
Global Load Balancing
Once you serve users in multiple regions, you need DNS- or anycast-based routing on top of regional LBs:
- DNS-based (latency / geo) — Route 53, Cloudflare, Google Cloud DNS return the nearest healthy region. Cheap, but DNS TTLs delay failover (clients often cache for minutes).
- Anycast — same IP advertised from many regions; BGP routes users to the nearest. Fast failover; available via Cloudflare, AWS Global Accelerator, GCP Premium Tier.
Failure Modes to Anticipate
- Connection draining — gracefully finish in-flight requests on a backend before removing it during deploy.
- Slow start — newly added instances ramp up gradually instead of getting full load instantly.
- Outlier ejection — automatically pull a backend that returns errors above a threshold.
- Capacity collapse — when one of N backends dies, the remaining N−1 absorb the extra load. Always plan capacity assuming one zone is down.
Service Mesh: LB Per Service
In a microservices world, you don't have one load balancer; you have one per service-to-service edge. Service meshes (Istio, Linkerd, Consul Connect, Cloud Map + App Mesh) push L7 LB into a sidecar next to each pod. Benefits: uniform mTLS, retries, circuit breakers, observability across all services.
Cert Mapping
| Cert | Load balancing scope |
|---|---|
| AWS SAA-C03 | ALB vs NLB vs GLB; target groups; cross-zone; sticky sessions |
| Azure AZ-104 / AZ-305 | Azure Load Balancer (L4), Application Gateway (L7), Front Door (global) |
| GCP ACE / PCA | Global HTTP(S) LB, Network LB, regional LBs, anycast IP |
| CKA / CKAD | Service types (ClusterIP, NodePort, LoadBalancer), Ingress controllers |
Default Architecture
- Global DNS / anycast layer routes users to the nearest healthy region.
- Regional L7 load balancer terminates TLS and routes by path/host.
- Stateless app instances behind the L7, scaled horizontally with health checks.
- Internal L4 or service mesh for service-to-service.
This is the spine of nearly every modern web architecture. The next lesson covers what those app instances should put in front of their data: caching.