Skip to content
5 min read·Lesson 3 of 10

Load Balancing

How load balancers spread traffic and provide fault tolerance. L4 vs L7, algorithms, sticky sessions, health checks, and global DNS-level balancing.

A load balancer is the front door of every scalable web system. It does two jobs: spread traffic across many servers, and route around failures. Get this layer right and most other concerns become tractable.

Where Load Balancers Live

[ Users ]
    │
    ▼
[ DNS / Global LB ]   ← geographic + region failover
    │
    ▼
[ Regional / L7 LB ]   ← TLS terminate, path routing
    │
    ▼
[ Internal L4 LB ]     ← service-to-service
    │
    ▼
[ App instances ]

Real systems often have two or three layers. Each layer has different requirements.

L4 vs L7

L4 (Transport)L7 (Application)
Operates onTCP / UDP packetsHTTP requests
Decisions based onIP, port, connection statePath, host, header, cookie, body
TLSPass-through (encrypted)Terminates, can re-encrypt
SpeedVery fast, low CPUSlower, much more flexible
ExamplesAWS NLB, GCP Network LB, IPVS, HAProxy TCP modeAWS ALB, GCP HTTP(S) LB, NGINX, Envoy, Traefik

Pick L7 when you need routing intelligence: /api goes one place, /static another; canary routing by header; per-path rate limits. Pick L4 for raw throughput, non-HTTP protocols, or when TLS must pass through unmodified.

Algorithms

AlgorithmHow it picksWhen to use
Round-robinEach backend in turnUniform, stateless requests
Weighted round-robinHeavier backends get moreHeterogeneous instance sizes
Least connectionsBackend with fewest active connectionsVariable request durations (LLM, websockets)
Least response timeLowest measured latencyBackends with mixed health
Hash (consistent)Deterministic by key (IP, user)Cache affinity, session affinity
Random / power-of-two-choicesPick 2 random, take less loadedSurprisingly good default at scale

"Power of two choices" is the unsung hero — it gets nearly the benefit of least-connections with much less coordination cost. Many modern proxies (Envoy, Linkerd) default to it.

Health Checks

Without health checks, the load balancer happily sends traffic to dead servers. Two flavours:

  • Active — periodically poll /healthz; remove unhealthy backends.
  • Passive — observe real request results; eject backends that fail too often.

Get the health check right or it becomes the outage. Common bugs:

  • Too shallow — checks the process is running but not that DB connectivity works → traffic flows to broken servers.
  • Too deep — health check itself depends on a downstream that flakes → cascading failure removes all backends at once.
  • No staggering — all backends fail health checks simultaneously when a shared dependency hiccups.

A common pattern: shallow check for liveness, deep check for readiness, with separate endpoints.

Sticky Sessions

Sticky sessions (session affinity) bind a user to one backend instance, usually via cookie or hash. Useful for:

  • In-memory session state.
  • Long-lived connections (WebSocket, SSE, gRPC streams).
  • Cache locality.

The cost: when that backend dies, the user's session goes with it; rolling deploys interrupt users; horizontal scaling becomes uneven.

The right answer for state is usually: push it down to a shared store (Redis, database) and keep your app servers stateless. Use stickiness only when you cannot avoid it (WebSockets) and design for graceful failover.

TLS Termination

L7 load balancers usually terminate TLS — the LB owns the cert, decrypts, then talks plain HTTP (or re-encrypts) to backends. Benefits: cert management in one place; CPU offloaded; ability to inspect/route on path.

For zero-trust or regulated environments, prefer pass-through TLS (NLB) or re-encrypt to backends so traffic is never plaintext on the wire.

Global Load Balancing

Once you serve users in multiple regions, you need DNS- or anycast-based routing on top of regional LBs:

  • DNS-based (latency / geo) — Route 53, Cloudflare, Google Cloud DNS return the nearest healthy region. Cheap, but DNS TTLs delay failover (clients often cache for minutes).
  • Anycast — same IP advertised from many regions; BGP routes users to the nearest. Fast failover; available via Cloudflare, AWS Global Accelerator, GCP Premium Tier.

Failure Modes to Anticipate

  • Connection draining — gracefully finish in-flight requests on a backend before removing it during deploy.
  • Slow start — newly added instances ramp up gradually instead of getting full load instantly.
  • Outlier ejection — automatically pull a backend that returns errors above a threshold.
  • Capacity collapse — when one of N backends dies, the remaining N−1 absorb the extra load. Always plan capacity assuming one zone is down.

Service Mesh: LB Per Service

In a microservices world, you don't have one load balancer; you have one per service-to-service edge. Service meshes (Istio, Linkerd, Consul Connect, Cloud Map + App Mesh) push L7 LB into a sidecar next to each pod. Benefits: uniform mTLS, retries, circuit breakers, observability across all services.

Cert Mapping

CertLoad balancing scope
AWS SAA-C03ALB vs NLB vs GLB; target groups; cross-zone; sticky sessions
Azure AZ-104 / AZ-305Azure Load Balancer (L4), Application Gateway (L7), Front Door (global)
GCP ACE / PCAGlobal HTTP(S) LB, Network LB, regional LBs, anycast IP
CKA / CKADService types (ClusterIP, NodePort, LoadBalancer), Ingress controllers

Default Architecture

  1. Global DNS / anycast layer routes users to the nearest healthy region.
  2. Regional L7 load balancer terminates TLS and routes by path/host.
  3. Stateless app instances behind the L7, scaled horizontally with health checks.
  4. Internal L4 or service mesh for service-to-service.

This is the spine of nearly every modern web architecture. The next lesson covers what those app instances should put in front of their data: caching.

Key Takeaways

  • A load balancer distributes traffic across instances and removes unhealthy ones from rotation.
  • L4 balancers route on TCP/UDP — fast and protocol-agnostic; L7 balancers understand HTTP and route on path/header.
  • Round-robin is fine for uniform requests; least-connections handles variable workloads better.
  • Sticky sessions trade horizontal scalability for in-memory session simplicity — usually a bad trade.
  • Global load balancing (DNS, anycast) routes users to the nearest healthy region.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →