System design is the bridge between knowing how to write code and knowing how to build systems that hundreds of people, or millions of users, will depend on. It is the question every senior engineer eventually has to answer: given these constraints, how should the pieces fit together?
Functional vs Non-Functional Requirements
| Type | Asks | Examples |
|---|---|---|
| Functional | What must the system do? | Users can post messages; admins can delete them; prices update hourly |
| Non-functional | How well must it do it? | p99 latency < 200 ms; 99.95% uptime; survive a region outage |
Junior engineers focus on the functional. System design lives in the non-functional. The interesting decisions — sharding, caching, replication, queues — exist because of latency, throughput, availability, and cost requirements, not because of features.
The Five Constraints
Every design balances five forces. Improving one usually costs another:
- Latency — time per request.
- Throughput — requests per second the system can sustain.
- Availability — fraction of time the system is up and serving.
- Consistency — guarantees about what data readers see.
- Cost — money and operational complexity.
You cannot simultaneously maximise all five. A bank values consistency over latency; a social feed values availability and latency over strict consistency; a CDN trades freshness (consistency) for global low latency.
The Estimation Habit
Designs without numbers are vibes. Build the habit of back-of-envelope estimation:
- 1 day ≈ 86,400 seconds. 100k requests/day ≈ ~1.2 req/sec average.
- 1 million requests/day ≈ ~12 req/sec average; peak likely 3–10× higher.
- A modern SSD does ~100k IOPS at single-digit ms latency.
- Cross-AZ network is ~1 ms; cross-region is ~50–150 ms.
- An L1 cache hit is ~1 ns; main memory ~100 ns; SSD ~100 µs; disk ~10 ms; cross-region network ~100 ms.
- 1 GB ≈ 10⁹ bytes. 1 KB record × 1B records ≈ 1 TB.
If your "design" doesn't tell the reviewer how much data, how many requests, and how fast — you haven't designed it yet.
Example: Sizing a URL Shortener
Assumptions:
- 100M new URLs / month → ~40 writes/sec average
- Read:write = 100:1 → ~4,000 reads/sec average
- Peak = 5× average → ~20,000 reads/sec peak
- Each record ≈ 500 bytes
- Retention 5 years → 100M × 12 × 5 = 6B records
- Storage: 6B × 500 B ≈ 3 TB
Now the conversation has shape: 20k reads/sec is well within a single Redis cluster; 3 TB fits a single sharded SQL or DynamoDB table. The design follows from the numbers.
The Common Building Blocks
Most production systems are assembled from a small palette of components, which the rest of this course covers:
- Load balancers — spread traffic, fail over.
- Web / app servers — stateless, horizontally scalable.
- Caches — Redis, Memcached, CDN edges.
- Databases — SQL (Postgres, MySQL), NoSQL (Dynamo, Cassandra, Mongo, Bigtable).
- Object storage — S3, GCS, Azure Blob.
- Message queues / streams — SQS, Kafka, Pub/Sub, RabbitMQ.
- Search — Elasticsearch, OpenSearch, Algolia.
- Background workers — async processors consuming queues.
- CDN — caches static (and increasingly dynamic) content close to users.
Senior engineering is not memorising every product — it is knowing which class of component each problem calls for, and what the trade-offs are between options inside that class.
The Mental Model: Boxes, Arrows, Numbers
A good system design is three things:
- Boxes — components with clear responsibilities.
- Arrows — protocols, payloads, request/response or async.
- Numbers — QPS, latency budgets, storage, cost.
If you can sketch the boxes and arrows, then put plausible numbers on each arrow, you have done system design. Anything else is decoration.
Where the Certs Fit
| Cert | System design overlap |
|---|---|
| AWS Solutions Architect Associate (SAA-C03) | Heavy — picking services for workloads, HA, DR, cost |
| AWS Solutions Architect Professional | Multi-account, multi-region designs at depth |
| Google Professional Cloud Architect | Case-study based; full system design under constraints |
| Azure Solutions Architect Expert | Architectural decisions across Azure services |
The vendor certs are system design with vendor-specific vocabulary. Master the concepts here and the cert is mostly mapping concept → product name.
System Design in Interviews
Senior interviews almost always include a system design round. The interviewer cares less about whether you pick "the right" answer (often there isn't one) and more about:
- Do you clarify requirements before designing?
- Do you do the math — QPS, storage, bandwidth?
- Do you reason about trade-offs out loud?
- Do you know when to introduce caching, queues, or sharding — and why?
- Can you spot bottlenecks and discuss mitigations?
The final lesson of this course walks through three classic prompts (URL shortener, news feed, chat) using the framework we'll build along the way.
How to Use This Course
Read it linearly the first time. Each lesson is a building block; later ones assume the earlier vocabulary. After finishing, return to specific lessons as references when you face a real design or interview question. The goal is not memorisation — it is building a mental toolbox you can deploy under pressure.