Skip to content
5 min read·Lesson 9 of 10

CDNs, Rate Limiting, and Resilience

The protective layer of modern systems: CDNs at the edge, rate limits to fend off abuse, and circuit breakers to contain failure.

Performance and resilience are two sides of the same coin: serve fast when healthy, fail safely when not. CDNs handle the first; rate limits and circuit breakers handle the second.

CDNs: The Edge Layer

A Content Delivery Network is a fleet of servers in points of presence (PoPs) around the world that cache content close to users.

User in Tokyo  ──→  CDN PoP in Tokyo (cache hit)  ──→  Response in 5 ms
User in Tokyo  ──→  CDN PoP in Tokyo (miss)       ──→  Origin in Virginia (200 ms)
                                                       ──→ cache and return

The economic and performance impact is enormous: a CDN that absorbs 95% of traffic at 5 ms latency means your origin sees 5% at acceptable cost, and global users feel local performance.

What CDNs cache

  • Static assets (JS, CSS, images, fonts) — the original use case.
  • Dynamic API responses with appropriate Cache-Control.
  • Streaming video segments.
  • Software downloads and updates.

Cache control headers

Cache-Control: public, max-age=3600, s-maxage=86400, stale-while-revalidate=60
ETag: "abc123"
Vary: Accept-Encoding, Authorization
  • max-age — browser cache lifetime.
  • s-maxage — CDN/proxy cache lifetime (often longer).
  • stale-while-revalidate — serve stale, refresh in background.
  • stale-if-error — serve stale on origin failure (huge resilience win).
  • Vary — split cache by these headers.

Beyond caching

Modern CDNs (Cloudflare, Fastly, CloudFront, Akamai) also offer:

  • DDoS protection at the edge — absorb millions of req/sec before they reach origin.
  • WAF (web application firewall).
  • TLS termination with HTTP/3.
  • Edge compute (Cloudflare Workers, Lambda@Edge, Fastly Compute@Edge) — run code in the PoP.
  • Image optimisation and resizing on the fly.

Rate Limiting

Without limits, a single buggy client or attacker can saturate a service. Rate limits define a contract: at most N requests per identity per window.

Token bucket

Each identity has a bucket of tokens that refills at a fixed rate up to a cap. Each request consumes a token; if empty, reject.

  • Allows short bursts (up to bucket size).
  • Smooth average rate.
  • Used by AWS, Stripe, GitHub.

Leaky bucket

Requests enter a fixed-size queue that drains at a constant rate. Overflow is rejected.

  • Smooths out bursts entirely.
  • Strict steady throughput.

Fixed window vs sliding window

  • Fixed: count per minute, reset at the boundary. Simple but allows 2× burst around boundaries.
  • Sliding: rolling window. Smoother, slightly more state per identity.

Where to limit

LayerPerWhy
CDN / edgeIP, geographical regionStop floods before they hit you
API gatewayAPI key, user, routePer-customer SLAs
Per-servicePer caller, per endpointProtect specific hot endpoints
DatabasePer query, per connectionFinal backstop

Use multiple layers: edge for DDoS, API gateway for per-customer fairness, per-service for endpoint-specific limits.

Returning rate-limit responses

Use HTTP 429 with informative headers:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737456000
Retry-After: 30

Good clients respect Retry-After; their retries should use exponential backoff with jitter.

Timeouts

Every external call needs a timeout. No exceptions. A request without a timeout is a thread waiting for a server in another country that may have rebooted hours ago.

  • Pick timeouts based on p99 of the dependency, plus margin.
  • Connect timeout vs read timeout — set both.
  • Total request budget should propagate down the call chain ("you have 200 ms left").

Retries with Backoff

Naive retries amplify failure: 1000 clients, 3 retries each = 3000 requests piling on an already-struggling service. Use exponential backoff with jitter:

import random, time

def call_with_retry(fn, max_attempts=3, base=0.1, cap=2.0):
    for attempt in range(max_attempts):
        try:
            return fn()
        except TransientError:
            if attempt == max_attempts - 1:
                raise
            delay = min(cap, base * (2 ** attempt))
            sleep_for = delay * (0.5 + random.random())  # jitter
            time.sleep(sleep_for)

Only retry idempotent operations. Retrying a non-idempotent charge_card can double-charge.

Circuit Breakers

A circuit breaker wraps a downstream call and tracks failures. After a threshold, it "opens" — failing fast for a cooldown period instead of beating on the dependency. Then it lets a trickle through ("half-open") to test recovery.

[CLOSED] — calls flow normally; track failures
   │  failure rate > threshold
   ▼
[OPEN]   — fail fast, don't call dependency
   │  cooldown elapsed
   ▼
[HALF-OPEN] — let a few through; if good → CLOSED; if bad → OPEN

Why it matters: without breakers, a slow downstream service consumes all your threads / connections — your healthy service becomes unhealthy because its caller does. Breakers contain blast radius.

Tools: Hystrix (legacy), Resilience4j (JVM), Polly (.NET), service mesh (Istio, Linkerd, Envoy native).

Bulkheads

Like compartments in a ship: isolate resources so a failure in one area can't sink the whole vessel.

  • Separate thread pool / connection pool per downstream.
  • Separate queue per priority class.
  • Separate database for premium customers.

One downstream going slow consumes only its own pool, not all of yours.

Graceful Degradation

When a dependency fails, don't fail the whole feature. Examples:

  • Recommendations service down → show generic top-sellers.
  • Reviews service down → show product page without reviews + a small notice.
  • Profile picture CDN down → show a default avatar.

The mantra: a degraded experience beats an error page.

Chaos Engineering

You don't know your system is resilient until you break it on purpose. Tools (Chaos Monkey, Gremlin, Litmus, AWS FIS) inject failures in production-like environments — kill instances, slow networks, throttle CPUs — to verify the system survives.

Cert Mapping

CertResilience scope
AWS SAA / SAPCloudFront, Shield, WAF, API Gateway throttling, Multi-AZ, Multi-Region
Azure AZ-305Front Door, App Gateway WAF, traffic manager, availability zones
GCP PCACloud CDN, Armor, regional/global LB

The Resilience Checklist

  1. Every external call has a timeout.
  2. Retries use exponential backoff with jitter, only on idempotent ops.
  3. Circuit breakers wrap critical downstreams.
  4. Rate limits exist at the edge and per-service.
  5. Bulkheads isolate pools.
  6. CDNs and caches let the system serve stale-but-correct on origin failure.
  7. Graceful degradation is designed, not improvised.
  8. Chaos drills run regularly.

The final lesson assembles everything we've covered into three canonical interview-style designs.

Key Takeaways

  • CDNs serve content from points of presence near the user — the largest single performance lever for global apps.
  • Rate limiting protects backends from abuse, runaway clients, and accidental DDoS — but only if applied at the right layer.
  • Token bucket and leaky bucket are the two essential rate-limiting algorithms.
  • Circuit breakers stop cascading failure by failing fast when a downstream is unhealthy.
  • Bulkheads, timeouts, and retries with backoff are non-negotiable in distributed systems.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →