Caching Strategies — System Design Fundamentals | CertQnA

"There are only two hard problems in computer science: cache invalidation and naming things." — Phil Karlton. Caching is everywhere because it works; it is dangerous because the bugs are subtle.

Why Cache

Latency — RAM is ~1000× faster than database disk reads.
Throughput — absorb most traffic before it touches expensive backends.
Cost — a Redis read is ~$0.0000001; a complex SQL query may cost millions of times more in compute.
Resilience — caches can keep serving stale data when origins fail.

The Stack of Caches

Layer	What it caches	TTL
Browser	Static assets, API GETs (Cache-Control)	seconds → days
CDN edge	Static + cacheable dynamic responses	seconds → days
API gateway / reverse proxy	Common GETs, auth tokens	seconds
Application cache (Redis, Memcached)	Computed objects, hot rows	seconds → minutes
In-process / in-memory	Configuration, hot lookups	seconds
Database buffer pool	Recently-accessed pages	managed by DB

Good designs cache at multiple layers; each absorbs a different fraction of traffic before the next.

Caching Patterns

Cache-aside (lazy loading)

def get_user(user_id):
    user = cache.get(f"user:{user_id}")
    if user is not None:
        return user
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    cache.set(f"user:{user_id}", user, ttl=300)
    return user

App owns the cache logic.
Misses go to the database.
Stale on writes unless invalidated.

Most common pattern. Default unless something else clearly fits better.

Write-through

def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id = %s", user_id)
    cache.set(f"user:{user_id}", data, ttl=300)

Every write goes to cache and DB synchronously.
Cache is always fresh; reads never miss for written keys.
Slower writes; cache pollution if writes are not read soon after.

Write-behind (write-back)

App writes to cache; cache flushes to DB asynchronously.
Fast writes; risk of data loss on cache failure.
Used for very write-heavy paths where eventual durability is acceptable.

Read-through

Cache (not app) loads from DB on miss.
App treats the cache as the source of truth.
Simpler app code; tighter coupling between cache and DB.

Eviction Policies

Caches are bounded. When full, something must go:

LRU (Least Recently Used) — evict the longest-untouched item. Default for most workloads.
LFU (Least Frequently Used) — evict the least-accessed; better for skewed workloads.
FIFO — first in, first out; simple but rarely best.
TTL only — items expire after a fixed time; combined with eviction, not as a replacement.

Invalidation Strategies

The hardest part. Three options:

TTL-based — set short expiration; accept staleness up to TTL. Simple, works for most read paths.
Write-time invalidation — on update, delete the cache key. Risk: lost invalidation = stale forever until TTL.
Event-driven — DB change feed (CDC) publishes events that invalidate caches. Most reliable, most complex.

A common pragmatic combo: short TTL (e.g. 60s) + write-time delete. The TTL is the safety net for missed invalidations.

Hot Keys

One key gets 50% of traffic — celebrity user, viral product. Symptoms: one cache shard at 100% CPU; everything else idle.

Mitigations:

Replicate hot keys to multiple cache nodes (read replicas).
Add a small in-process LRU in front of the distributed cache for top-N keys.
Shard with consistent hashing + virtual nodes to avoid concentrated hashing.

Thundering Herd / Cache Stampede

The hot key expires at exactly noon. 10,000 concurrent requests all miss simultaneously and hammer the database.

Mitigations:

Request coalescing / single-flight — only one request fetches; others wait for the result.
Probabilistic early expiration — refresh the cache stochastically before TTL.
Background refresh — refresh hot keys on a schedule before they expire.
Stale-while-revalidate — serve the stale value, refresh in the background.

Cache Penetration

Every miss hits the DB — which is fine, except when a flood of requests asks for keys that don't exist (e.g. attacker probing user IDs). Negative caching ("this key is known absent") fixes it; bloom filters can pre-check existence cheaply.

Consistency Issues

Caches make eventual consistency the default. Be explicit about what staleness is acceptable:

User profile picture: tens of seconds is fine.
Product price: seconds at most for retail; pricing must agree across pages.
Account balance: do not cache; or cache with versioning + reads-through-cache only for display, never for decisions.

CDN Caching

CDNs (Cloudflare, CloudFront, Fastly, Akamai) push caches to hundreds of edge locations near users. They were originally for static assets but increasingly cache dynamic API responses too.

Key headers: Cache-Control, Surrogate-Control, Vary, ETag. Use stale-while-revalidate and stale-if-error for resilience.

The cheapest and fastest request is one that the user's nearest CDN edge handles in 5 ms without your origin ever seeing it.

What Not to Cache

Anything where staleness causes incorrect business decisions (balances, inventory at checkout).
Highly personalised data that has no shared keys (cache hit rate would be ~0).
Cheap-to-compute, high-write data — caching may add complexity for little gain.

Cert Mapping

Cert	Caching scope
AWS SAA-C03	CloudFront, ElastiCache (Redis / Memcached), DAX for DynamoDB
Azure / GCP equivalents	Azure Cache for Redis, Front Door; Cloud CDN, Memorystore

Default Mindset

Treat the cache as a hint, not the source of truth. Design so the system still works (slower) when the cache is cold or down. Cache code paths that are slow and repetitive; ignore everything else. And remember: every cache you add is one more thing that can serve wrong answers.