Skip to content
5 min read·Lesson 8 of 10

Rate Limiting and Caching APIs

How to keep an API fast and protected: HTTP caching, rate limits per identity, quotas, and the response patterns that make clients behave.

Two forces shape every public API: callers who hammer it harder than they should, and infrastructure costs that scale with every request you choose to actually serve. Rate limiting tames the first; caching tames the second.

What to Limit

ConcernLimit byWindow
Spam / abuseAPI key, user ID, IPPer second / minute
Cost / fairnessAccount / tenantPer minute / hour / day
Hot endpointsEndpoint × identityPer second
Plan tiersSubscription levelPer month

Algorithms

  • Token bucket — refills at a rate, allows bursts up to capacity. The default for most APIs (Stripe, AWS).
  • Leaky bucket — strict steady rate, no bursts.
  • Fixed window — N per minute, reset at boundary. Simple but allows 2× burst near boundaries.
  • Sliding window log / counter — smooth, slightly more state.

For implementation, Redis is the standard backing store; INCR + EXPIRE or Lua scripts give you atomicity.

Returning a Limit Response

Be explicit, every time:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737456000
Retry-After: 30
Content-Type: application/json

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Too many requests. Retry after 30 seconds."
  }
}

Send the rate-limit headers on every response, not just 429s. Good clients then never get 429s — they self-throttle.

Quotas vs Rates

  • Rate — short window: requests per second. Protects the system.
  • Quota — long window: requests per day or per month. Aligns with billing.

Use both. A free-tier user may have 60 req/min and 10,000 req/day. A paid user has higher limits or no quota.

Where to Enforce

  1. Edge / CDN — block obvious floods at the network edge before they cost you.
  2. API gateway — per-identity, per-route limits.
  3. Service — endpoint-specific limits the gateway can't know.

Idempotency for Mutations

Rate limits cause clients to retry. Without idempotency, those retries duplicate writes. Accept an idempotency key:

POST /payments
Idempotency-Key: 5e3d2f...
Content-Type: application/json
{ "amount": 1000 }

The server records the result keyed by the idempotency key; retries with the same key return the original response. Window: 24h is typical.

HTTP Caching for APIs

Most APIs underuse HTTP caching. With proper headers, your CDN serves a huge share of read traffic for free.

Cache-Control

Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=30
  • public — proxies and CDNs may cache.
  • private — only the browser may cache.
  • max-age — seconds for browser cache.
  • s-maxage — seconds for shared cache (CDN).
  • stale-while-revalidate — serve stale, refresh in background.
  • no-store — never cache.

ETag / If-None-Match

GET /orders/100
→ 200 OK
ETag: "v3-7c2a"
{ ... }

GET /orders/100
If-None-Match: "v3-7c2a"
→ 304 Not Modified  (no body — saves bandwidth)

304 responses let you cache safely even when content can change — clients only re-download when the ETag does.

Vary

Tell caches which request headers should split the cache:

Vary: Accept, Accept-Encoding, Authorization

Without Vary: Authorization, you risk serving Alice's data to Bob from cache. Don't.

What Not to Cache

  • Authenticated, user-specific data — unless the cache is private and the URL is unique to the user.
  • Anything that mutates side effects (POST, PATCH, DELETE).
  • Real-time data with no tolerance for staleness.

Cache Invalidation

The hardest problem. Patterns:

  • Short TTL. Set seconds-to-minutes; accept staleness.
  • Cache-busting URLs. Versioned paths — /v3/products — change when content changes.
  • Purge API. Most CDNs support targeted invalidation; integrate into your write path for time-critical changes.

Throttling vs Rejecting

Sometimes 429-then-retry is wasteful. Alternatives:

  • Queue the request server-side and respond when capacity exists (only for short waits).
  • Provide a slower-but-cheaper async endpoint for batch consumers.
  • Offer webhooks instead of polling for change-data-capture use cases.

Cert Mapping

CertScope
AWS SAAAPI Gateway throttling, usage plans, caching
AWS Data EngineerCaching strategies for high-throughput APIs

The next lesson covers a longer-horizon concern: how to evolve an API without breaking the clients you already have.

Key Takeaways

  • Rate limit by identity, not by IP alone — proxies and shared NATs make IP unreliable.
  • Use 429 with Retry-After and explicit rate-limit headers.
  • HTTP caching at the edge can absorb 90% of read traffic if you set headers correctly.
  • Quotas (per day) complement rates (per second) for billing fairness.
  • Idempotency keys are the safe way to let clients retry mutations.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →