Performance and resilience are two sides of the same coin: serve fast when healthy, fail safely when not. CDNs handle the first; rate limits and circuit breakers handle the second.
CDNs: The Edge Layer
A Content Delivery Network is a fleet of servers in points of presence (PoPs) around the world that cache content close to users.
User in Tokyo ──→ CDN PoP in Tokyo (cache hit) ──→ Response in 5 ms
User in Tokyo ──→ CDN PoP in Tokyo (miss) ──→ Origin in Virginia (200 ms)
──→ cache and return
The economic and performance impact is enormous: a CDN that absorbs 95% of traffic at 5 ms latency means your origin sees 5% at acceptable cost, and global users feel local performance.
What CDNs cache
- Static assets (JS, CSS, images, fonts) — the original use case.
- Dynamic API responses with appropriate
Cache-Control. - Streaming video segments.
- Software downloads and updates.
Cache control headers
Cache-Control: public, max-age=3600, s-maxage=86400, stale-while-revalidate=60
ETag: "abc123"
Vary: Accept-Encoding, Authorization
max-age— browser cache lifetime.s-maxage— CDN/proxy cache lifetime (often longer).stale-while-revalidate— serve stale, refresh in background.stale-if-error— serve stale on origin failure (huge resilience win).Vary— split cache by these headers.
Beyond caching
Modern CDNs (Cloudflare, Fastly, CloudFront, Akamai) also offer:
- DDoS protection at the edge — absorb millions of req/sec before they reach origin.
- WAF (web application firewall).
- TLS termination with HTTP/3.
- Edge compute (Cloudflare Workers, Lambda@Edge, Fastly Compute@Edge) — run code in the PoP.
- Image optimisation and resizing on the fly.
Rate Limiting
Without limits, a single buggy client or attacker can saturate a service. Rate limits define a contract: at most N requests per identity per window.
Token bucket
Each identity has a bucket of tokens that refills at a fixed rate up to a cap. Each request consumes a token; if empty, reject.
- Allows short bursts (up to bucket size).
- Smooth average rate.
- Used by AWS, Stripe, GitHub.
Leaky bucket
Requests enter a fixed-size queue that drains at a constant rate. Overflow is rejected.
- Smooths out bursts entirely.
- Strict steady throughput.
Fixed window vs sliding window
- Fixed: count per minute, reset at the boundary. Simple but allows 2× burst around boundaries.
- Sliding: rolling window. Smoother, slightly more state per identity.
Where to limit
| Layer | Per | Why |
|---|---|---|
| CDN / edge | IP, geographical region | Stop floods before they hit you |
| API gateway | API key, user, route | Per-customer SLAs |
| Per-service | Per caller, per endpoint | Protect specific hot endpoints |
| Database | Per query, per connection | Final backstop |
Use multiple layers: edge for DDoS, API gateway for per-customer fairness, per-service for endpoint-specific limits.
Returning rate-limit responses
Use HTTP 429 with informative headers:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737456000
Retry-After: 30
Good clients respect Retry-After; their retries should use exponential backoff with jitter.
Timeouts
Every external call needs a timeout. No exceptions. A request without a timeout is a thread waiting for a server in another country that may have rebooted hours ago.
- Pick timeouts based on p99 of the dependency, plus margin.
- Connect timeout vs read timeout — set both.
- Total request budget should propagate down the call chain ("you have 200 ms left").
Retries with Backoff
Naive retries amplify failure: 1000 clients, 3 retries each = 3000 requests piling on an already-struggling service. Use exponential backoff with jitter:
import random, time
def call_with_retry(fn, max_attempts=3, base=0.1, cap=2.0):
for attempt in range(max_attempts):
try:
return fn()
except TransientError:
if attempt == max_attempts - 1:
raise
delay = min(cap, base * (2 ** attempt))
sleep_for = delay * (0.5 + random.random()) # jitter
time.sleep(sleep_for)
Only retry idempotent operations. Retrying a non-idempotent charge_card can double-charge.
Circuit Breakers
A circuit breaker wraps a downstream call and tracks failures. After a threshold, it "opens" — failing fast for a cooldown period instead of beating on the dependency. Then it lets a trickle through ("half-open") to test recovery.
[CLOSED] — calls flow normally; track failures
│ failure rate > threshold
▼
[OPEN] — fail fast, don't call dependency
│ cooldown elapsed
▼
[HALF-OPEN] — let a few through; if good → CLOSED; if bad → OPEN
Why it matters: without breakers, a slow downstream service consumes all your threads / connections — your healthy service becomes unhealthy because its caller does. Breakers contain blast radius.
Tools: Hystrix (legacy), Resilience4j (JVM), Polly (.NET), service mesh (Istio, Linkerd, Envoy native).
Bulkheads
Like compartments in a ship: isolate resources so a failure in one area can't sink the whole vessel.
- Separate thread pool / connection pool per downstream.
- Separate queue per priority class.
- Separate database for premium customers.
One downstream going slow consumes only its own pool, not all of yours.
Graceful Degradation
When a dependency fails, don't fail the whole feature. Examples:
- Recommendations service down → show generic top-sellers.
- Reviews service down → show product page without reviews + a small notice.
- Profile picture CDN down → show a default avatar.
The mantra: a degraded experience beats an error page.
Chaos Engineering
You don't know your system is resilient until you break it on purpose. Tools (Chaos Monkey, Gremlin, Litmus, AWS FIS) inject failures in production-like environments — kill instances, slow networks, throttle CPUs — to verify the system survives.
Cert Mapping
| Cert | Resilience scope |
|---|---|
| AWS SAA / SAP | CloudFront, Shield, WAF, API Gateway throttling, Multi-AZ, Multi-Region |
| Azure AZ-305 | Front Door, App Gateway WAF, traffic manager, availability zones |
| GCP PCA | Cloud CDN, Armor, regional/global LB |
The Resilience Checklist
- Every external call has a timeout.
- Retries use exponential backoff with jitter, only on idempotent ops.
- Circuit breakers wrap critical downstreams.
- Rate limits exist at the edge and per-service.
- Bulkheads isolate pools.
- CDNs and caches let the system serve stale-but-correct on origin failure.
- Graceful degradation is designed, not improvised.
- Chaos drills run regularly.
The final lesson assembles everything we've covered into three canonical interview-style designs.