Rate Limiting and Caching APIs — API Design: REST and GraphQL | CertQnA

Two forces shape every public API: callers who hammer it harder than they should, and infrastructure costs that scale with every request you choose to actually serve. Rate limiting tames the first; caching tames the second.

What to Limit

Concern	Limit by	Window
Spam / abuse	API key, user ID, IP	Per second / minute
Cost / fairness	Account / tenant	Per minute / hour / day
Hot endpoints	Endpoint × identity	Per second
Plan tiers	Subscription level	Per month

Algorithms

Token bucket — refills at a rate, allows bursts up to capacity. The default for most APIs (Stripe, AWS).
Leaky bucket — strict steady rate, no bursts.
Fixed window — N per minute, reset at boundary. Simple but allows 2× burst near boundaries.
Sliding window log / counter — smooth, slightly more state.

For implementation, Redis is the standard backing store; INCR + EXPIRE or Lua scripts give you atomicity.

Returning a Limit Response

Be explicit, every time:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737456000
Retry-After: 30
Content-Type: application/json

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Too many requests. Retry after 30 seconds."
  }
}

Send the rate-limit headers on every response, not just 429s. Good clients then never get 429s — they self-throttle.

Quotas vs Rates

Rate — short window: requests per second. Protects the system.
Quota — long window: requests per day or per month. Aligns with billing.

Use both. A free-tier user may have 60 req/min and 10,000 req/day. A paid user has higher limits or no quota.

Where to Enforce

Edge / CDN — block obvious floods at the network edge before they cost you.
API gateway — per-identity, per-route limits.
Service — endpoint-specific limits the gateway can't know.

Idempotency for Mutations

Rate limits cause clients to retry. Without idempotency, those retries duplicate writes. Accept an idempotency key:

POST /payments
Idempotency-Key: 5e3d2f...
Content-Type: application/json
{ "amount": 1000 }

The server records the result keyed by the idempotency key; retries with the same key return the original response. Window: 24h is typical.

HTTP Caching for APIs

Most APIs underuse HTTP caching. With proper headers, your CDN serves a huge share of read traffic for free.

Cache-Control

Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=30

public — proxies and CDNs may cache.
private — only the browser may cache.
max-age — seconds for browser cache.
s-maxage — seconds for shared cache (CDN).
stale-while-revalidate — serve stale, refresh in background.
no-store — never cache.

ETag / If-None-Match

GET /orders/100
→ 200 OK
ETag: "v3-7c2a"
{ ... }

GET /orders/100
If-None-Match: "v3-7c2a"
→ 304 Not Modified  (no body — saves bandwidth)

304 responses let you cache safely even when content can change — clients only re-download when the ETag does.

Vary

Tell caches which request headers should split the cache:

Vary: Accept, Accept-Encoding, Authorization

Without Vary: Authorization, you risk serving Alice's data to Bob from cache. Don't.

What Not to Cache

Authenticated, user-specific data — unless the cache is private and the URL is unique to the user.
Anything that mutates side effects (POST, PATCH, DELETE).
Real-time data with no tolerance for staleness.

Cache Invalidation

The hardest problem. Patterns:

Short TTL. Set seconds-to-minutes; accept staleness.
Cache-busting URLs. Versioned paths — /v3/products — change when content changes.
Purge API. Most CDNs support targeted invalidation; integrate into your write path for time-critical changes.

Throttling vs Rejecting

Sometimes 429-then-retry is wasteful. Alternatives:

Queue the request server-side and respond when capacity exists (only for short waits).
Provide a slower-but-cheaper async endpoint for batch consumers.
Offer webhooks instead of polling for change-data-capture use cases.

Cert Mapping

Cert	Scope
AWS SAA	API Gateway throttling, usage plans, caching
AWS Data Engineer	Caching strategies for high-throughput APIs

The next lesson covers a longer-horizon concern: how to evolve an API without breaking the clients you already have.