Two forces shape every public API: callers who hammer it harder than they should, and infrastructure costs that scale with every request you choose to actually serve. Rate limiting tames the first; caching tames the second.
What to Limit
| Concern | Limit by | Window |
|---|---|---|
| Spam / abuse | API key, user ID, IP | Per second / minute |
| Cost / fairness | Account / tenant | Per minute / hour / day |
| Hot endpoints | Endpoint × identity | Per second |
| Plan tiers | Subscription level | Per month |
Algorithms
- Token bucket — refills at a rate, allows bursts up to capacity. The default for most APIs (Stripe, AWS).
- Leaky bucket — strict steady rate, no bursts.
- Fixed window — N per minute, reset at boundary. Simple but allows 2× burst near boundaries.
- Sliding window log / counter — smooth, slightly more state.
For implementation, Redis is the standard backing store; INCR + EXPIRE or Lua scripts give you atomicity.
Returning a Limit Response
Be explicit, every time:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1737456000
Retry-After: 30
Content-Type: application/json
{
"error": {
"type": "rate_limit_exceeded",
"message": "Too many requests. Retry after 30 seconds."
}
}
Send the rate-limit headers on every response, not just 429s. Good clients then never get 429s — they self-throttle.
Quotas vs Rates
- Rate — short window: requests per second. Protects the system.
- Quota — long window: requests per day or per month. Aligns with billing.
Use both. A free-tier user may have 60 req/min and 10,000 req/day. A paid user has higher limits or no quota.
Where to Enforce
- Edge / CDN — block obvious floods at the network edge before they cost you.
- API gateway — per-identity, per-route limits.
- Service — endpoint-specific limits the gateway can't know.
Idempotency for Mutations
Rate limits cause clients to retry. Without idempotency, those retries duplicate writes. Accept an idempotency key:
POST /payments
Idempotency-Key: 5e3d2f...
Content-Type: application/json
{ "amount": 1000 }
The server records the result keyed by the idempotency key; retries with the same key return the original response. Window: 24h is typical.
HTTP Caching for APIs
Most APIs underuse HTTP caching. With proper headers, your CDN serves a huge share of read traffic for free.
Cache-Control
Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=30
public— proxies and CDNs may cache.private— only the browser may cache.max-age— seconds for browser cache.s-maxage— seconds for shared cache (CDN).stale-while-revalidate— serve stale, refresh in background.no-store— never cache.
ETag / If-None-Match
GET /orders/100
→ 200 OK
ETag: "v3-7c2a"
{ ... }
GET /orders/100
If-None-Match: "v3-7c2a"
→ 304 Not Modified (no body — saves bandwidth)
304 responses let you cache safely even when content can change — clients only re-download when the ETag does.
Vary
Tell caches which request headers should split the cache:
Vary: Accept, Accept-Encoding, Authorization
Without Vary: Authorization, you risk serving Alice's data to Bob from cache. Don't.
What Not to Cache
- Authenticated, user-specific data — unless the cache is private and the URL is unique to the user.
- Anything that mutates side effects (POST, PATCH, DELETE).
- Real-time data with no tolerance for staleness.
Cache Invalidation
The hardest problem. Patterns:
- Short TTL. Set seconds-to-minutes; accept staleness.
- Cache-busting URLs. Versioned paths —
/v3/products— change when content changes. - Purge API. Most CDNs support targeted invalidation; integrate into your write path for time-critical changes.
Throttling vs Rejecting
Sometimes 429-then-retry is wasteful. Alternatives:
- Queue the request server-side and respond when capacity exists (only for short waits).
- Provide a slower-but-cheaper async endpoint for batch consumers.
- Offer webhooks instead of polling for change-data-capture use cases.
Cert Mapping
| Cert | Scope |
|---|---|
| AWS SAA | API Gateway throttling, usage plans, caching |
| AWS Data Engineer | Caching strategies for high-throughput APIs |
The next lesson covers a longer-horizon concern: how to evolve an API without breaking the clients you already have.