Designing Real Systems: URL Shortener, Feed, Chat

This lesson walks through three classic prompts using a consistent framework. The point is not the specific answers but the shape of how a senior engineer reasons about them.

The Framework

Clarify — functional + non-functional requirements; explicit out-of-scope.
Estimate — QPS, storage, bandwidth.
API — sketch the public interface.
High-level design — boxes and arrows.
Data model — what stores, what schemas, what keys.
Scaling and bottlenecks — where it breaks; how to fix.
Trade-offs — what you chose and what you gave up.

Design 1: URL Shortener (TinyURL / bit.ly)

Clarify

POST a long URL → get a short code (e.g. tny.url/abc123).
GET short URL → 301/302 redirect to long URL.
Optionally: custom aliases, expiry, basic analytics.
Out of scope: user accounts, complex analytics dashboard.

Estimate

100M new URLs / month → ~40 writes/sec average
Read:write = 100:1     → 4,000 reads/sec average, ~20,000 peak
Each record ~500 B     → 6B records over 5y = 3 TB
Bandwidth: 20k req/s × 500 B headers = 10 MB/s — trivial

API

POST /shorten   { "url": "https://...", "alias": "optional" } → { "short": "abc123" }
GET  /:code     → 301 Location: <long URL>

High-level design

[ Users ] → [ CDN ] → [ Global LB ] → [ App tier (stateless) ]
                                              │
                                ┌─────────────┼─────────────┐
                                ▼             ▼             ▼
                          [ Cache (Redis) ] [ KV store ]  [ ID generator ]
                                              ▲
                                              │
                                       [ Analytics queue → warehouse ]

Data model

Primary table: short_code (PK) → long_url, created_at, expires_at.
Pick: DynamoDB or Cassandra at scale; Postgres is fine for under ~1B rows.
Cache hot codes in Redis; CDN caches the redirect response itself for popular links.

ID generation

Avoid auto-increment IDs across shards (coordination).
Generate 64-bit random IDs; base62-encode to 7-character codes (62⁷ ≈ 3.5T values).
Or pre-allocate ranges per app instance from a counter service (Snowflake pattern).

Scaling and bottlenecks

Reads dominate — CDN absorbs popular links; Redis absorbs the long tail.
Writes are low; a single sharded store handles them.
Analytics pipeline is async (queue → warehouse) so it never slows redirects.

Trade-offs

Random IDs vs sequential — random gives privacy; sequential is more cache-friendly. Random wins.
301 vs 302 redirects — 301 is cached aggressively (great for performance, hard to update analytics); 302 lets you count every click. Pick based on whether per-click counting matters.

Design 2: News Feed (Twitter / Instagram-style)

Clarify

Users post short messages; followers see them in their feed, newest first.
Feed must load fast (sub-second).
Out of scope: search, ads, video uploads.

Estimate

300M DAU
Posts/day  = 300M × 0.5 post/user/day = 150M posts/day → ~1,700 writes/sec
Feed loads = 300M × 5 loads/day        = 1.5B loads/day → ~17,000 reads/sec
Avg followers per user = 200 (median) but heavy tail: celebrities have 100M

API

POST /post   { "text": "..." }
GET  /feed?cursor=...   → list of posts

The fan-out question

The central decision: when Alice posts, how do her followers see it?

Option A — Push (fan-out on write)

On post, write into every follower's feed table.

+ Reads are O(1): just read your feed table.
− Writes are O(followers). Celebrity with 100M followers = 100M writes per post.

Option B — Pull (fan-out on read)

Each user pulls latest posts from people they follow at read time.

+ Writes are O(1).
− Reads are expensive — fetch from N timelines and merge. Painful at scale.

Option C — Hybrid (winner)

Push to followers for normal users (median follower count).
Celebrities (above some threshold) — pull. When a follower loads their feed, merge in celebrity timelines on the fly.
Most apps converge on this.

High-level design

[ User ] → [ LB ] → [ Post service ] → [ Posts DB ]
                                       └──→ [ Fan-out worker queue ]
                                                  │
                                                  ▼
                                         [ Feed cache per user (Redis) ]

[ User ] → [ LB ] → [ Feed service ]
                       │
                       ├── read user feed cache (fan-out-on-write timelines)
                       └── merge celebrity timelines on read (fan-out-on-read)

Data model

posts table: post_id (PK), author_id, text, created_at. Sharded by author_id.
Per-user feed cache: list of recent post_ids in Redis (capped at ~1000).
follows: follower_id, followee_id; sharded.

Scaling and bottlenecks

Fan-out queue absorbs celebrity post bursts; workers spread load.
Feed cache absorbs nearly all read traffic.
Posts DB is partitioned by author_id; reads of "this author's posts" hit one shard.

Trade-offs

Push gives instant reads; pull makes celebrities cheaper. Hybrid pays both costs in moderation.
Eventual consistency: a follower may see a post a few seconds late. Acceptable.
Reordering at boundaries when merging timelines — sort by timestamp post-merge.

Design 3: Chat (WhatsApp / Slack-style)

Clarify

1:1 and group messages, delivered in order, with read receipts.
Mobile-first; users go offline, come back, expect missed messages.
Sub-second delivery for online users.
Out of scope: voice/video calls, file uploads beyond simple attachments.

Estimate

1B users, 100M concurrent online
50 messages/user/day → 50B messages/day → ~600k writes/sec average
Each message ~500 B → 25 TB/day raw
Need long-term storage with 90-day hot, infinite cold

API / Protocol

Persistent WebSocket (or HTTP/2 server push, or QUIC) per online client.
Send: POST /messages or send-frame on socket → message_id, ack.
Receive: server pushes via the socket.
Read receipts: separate event type on the socket.

High-level design

[ Mobile clients ]
       │  WebSocket
       ▼
[ Edge / connection servers ]   ← stateful: hold N online sockets each, sticky
       │
       ▼
[ Message bus (Kafka) ]   ← every message published
       │
       ├── [ Storage workers ] → [ Messages store (wide-column / sharded SQL) ]
       ├── [ Delivery workers ] → push to recipient's connection server (if online)
       └── [ Push notification svc ] → APNs / FCM (if offline)

Data model

Messages stored per conversation, sorted by message_id (snowflake-style time-ordered).
Cassandra / Bigtable / DynamoDB ideal — write-heavy, partition by conversation_id, clustering by message_id.
Per-user "inbox" pointer: last delivered message_id per device.

Online vs offline delivery

Sender publishes message to bus.
Storage worker durably writes to messages store.
Delivery worker checks recipient's online state:
- Online → push to their connection server's WebSocket.
- Offline → enqueue push notification + mark for replay on reconnect.
On reconnect, client sends "give me everything since message_id X" — server streams missed messages.

Group chat

One write to the conversation; fan out to N members.
Mostly small groups (under 256) → push to all; large channels (Slack public, ~thousands) → pull on read or hybrid.

Trade-offs

Sticky WebSocket connections: easier delivery but require careful failover.
Strong ordering per conversation; relaxed across conversations.
Read receipts as separate events — keeps the message hot path lean.
End-to-end encryption (Signal protocol) changes the design — server can't read content but still routes by message_id and conversation_id.

The Pattern Across All Three

Notice the recurring moves:

Stateless app tier, stateful data + message bus.
Cache aggressively for read-heavy paths.
Async queues for anything that doesn't have to be in the request path.
Partition data by the natural access key (short_code, author_id, conversation_id).
Hybrid push/pull when one extreme has pathological cases.
CDN at the edge if any read can be cached publicly.

How to Practise

Pick a real product you use. Imagine you're building it from scratch.
Apply the framework above. Write the QPS math.
Force yourself to name specific trade-offs, not just "use Cassandra".
Compare to public engineering blogs (Twitter, Discord, Slack, WhatsApp, Uber, Pinterest, Airbnb publish frequently).

System design is a craft. The components in this course are the vocabulary; the practice is in assembling them under different constraints, again and again, until the right shape becomes intuition.

Closing

You now have a vocabulary spanning load balancers, caches, SQL and NoSQL, replication and sharding, consistency models, queues and events, CDNs and resilience patterns. Pair it with our DevOps, Cloud, and Data Engineering courses for the operational depth, and with the cert tracks for vendor-specific terminology. The fundamentals here will outlast any specific product.

The Framework

Design 1: URL Shortener (TinyURL / bit.ly)

Clarify

Estimate

API

High-level design

Data model

ID generation

Scaling and bottlenecks

Trade-offs

Design 2: News Feed (Twitter / Instagram-style)

Clarify

Estimate

API

The fan-out question

Option A — Push (fan-out on write)

Option B — Pull (fan-out on read)

Option C — Hybrid (winner)

High-level design

Data model

Scaling and bottlenecks

Trade-offs

Design 3: Chat (WhatsApp / Slack-style)

Clarify

Estimate

API / Protocol

High-level design

Data model

Online vs offline delivery

Group chat

Trade-offs

The Pattern Across All Three

How to Practise

Closing

Key Takeaways

Course Complete!