SQL vs NoSQL — System Design Fundamentals | CertQnA

The "SQL vs NoSQL" debate generated more heat than light. Today's reality: relational databases handle far bigger workloads than people remember, and NoSQL has matured into specific tools for specific shapes of problem. Pick by data shape and access pattern, not by fashion.

What SQL Gives You

A relational model — tables, rows, columns, foreign keys.
Joins — composing data across tables in queries.
ACID transactions — atomicity, consistency, isolation, durability.
Strong schema — columns and types declared and enforced.
Mature tooling, decades of operational knowledge, every ORM ever.

Tools: PostgreSQL, MySQL, SQL Server, Oracle, plus cloud-native: Aurora (AWS), Cloud SQL / Spanner (GCP), Azure SQL DB, CockroachDB, YugabyteDB.

What NoSQL Gives You (Family by Family)

Key-value

Hash table at huge scale: get(key), set(key, value).
Sub-ms latency, horizontal scale via consistent hashing.
No queries beyond key lookup.
Examples: Redis, DynamoDB (in basic mode), Memcached, etcd.
Use for: caches, session stores, leaderboards, simple lookups.

Document

JSON-shaped documents in collections.
Flexible schema; nested data first-class.
Secondary indexes; rich queries by field; weak/no joins.
Examples: MongoDB, DynamoDB, Cosmos DB, Firestore, Couchbase.
Use for: catalogues with variable shape, user profiles, content stores.

Wide-column

Tables with billions of rows and millions of (sparse) columns.
Rows partitioned by key; queries use the partition key + clustering key range.
Designed for write-heavy, very large scale.
Examples: Cassandra, ScyllaDB, HBase, Bigtable, DynamoDB (with sort keys).
Use for: time-series, IoT, event logs, large-scale audit trails.

Graph

Nodes and edges with properties.
Optimised for traversals — friend-of-friend, path-finding, recommendations.
Examples: Neo4j, Amazon Neptune, JanusGraph, ArangoDB.
Use for: social graphs, fraud rings, knowledge graphs, recommendations.

Search

Inverted-index stores for full-text and faceted search.
Examples: Elasticsearch, OpenSearch, Algolia, Meilisearch.
Use for: search bars, log analytics, observability.

Time-series

Append-mostly numeric data over time, with downsampling and retention.
Examples: InfluxDB, TimescaleDB (Postgres extension), Prometheus, ClickHouse for analytics.
Use for: metrics, IoT telemetry, financial tick data.

Picking Between Them

If your data is...	Default to
Relational with joins, complex queries, transactions	SQL (Postgres)
Single-key lookups at huge scale	Key-value (Redis, DynamoDB)
Variable-shape JSON documents	Document (Mongo, DynamoDB, Firestore)
Massive write throughput, time/event series	Wide-column (Cassandra, Bigtable)
Heavy traversal of relationships	Graph (Neo4j, Neptune)
Full-text search	Search (Elasticsearch)
Metrics over time	Time-series (Prometheus, Timescale)

The "Default to Postgres" Argument

For most apps you'll start, Postgres is the right default:

Handles millions of rows easily, hundreds of millions with thought.
JSONB columns let you store document-shaped data when you need to.
Full-text search built in; pgvector for embeddings; PostGIS for geo.
Logical replication, partitioning, foreign data wrappers.
Aurora PostgreSQL, Cloud SQL, RDS handle the operations.

Only reach for NoSQL when a specific access pattern, shape, or scale demands it. "We might need to scale" is rarely a good enough reason — premature NoSQL costs you joins and transactional integrity you'll miss.

Modern "NewSQL" — SQL at Scale

Distributed SQL databases now provide horizontal scale with relational semantics:

Google Spanner — global, externally consistent, used inside Google for years.
CockroachDB — Spanner-inspired, open-source, multi-region.
YugabyteDB — Postgres-compatible distributed SQL.
Aurora — managed Postgres / MySQL with separated storage and compute.

The historical "SQL doesn't scale" claim is mostly outdated. The newer trade-offs are cost and write latency, not "can it scale at all".

Polyglot Persistence

Real systems often combine multiple databases:

Postgres for the system of record.
Redis for sessions and hot caches.
Elasticsearch for search.
S3 + Iceberg for analytics.
DynamoDB for one specific high-scale low-latency lookup table.

Each store is chosen for what it does best. The cost: data must stay in sync across them — usually via change data capture (Debezium, Datastream) or event-driven updates.

ACID vs BASE

ACID (relational tradition)	BASE (NoSQL tradition)
Atomic, Consistent, Isolated, Durable	Basically Available, Soft state, Eventually consistent
Strong guarantees per transaction	Looser guarantees, optimised for scale

Modern systems are not strictly one or the other. DynamoDB has ACID transactions; Postgres has logical replication producing eventual consistency at replicas. Don't treat the labels as absolute.

Cost and Operations

Self-managed Postgres on a VM: cheapest at small scale, expensive in operator hours.
Managed Postgres (RDS, Aurora, Cloud SQL): expensive at small scale, cheap in operator hours; the right default for most teams.
DynamoDB / Bigtable / Cosmos: pay per request and per stored GB. Cheap at low traffic, can balloon at high QPS without thought.
Cassandra / self-managed Mongo: powerful, but operating at scale is a real specialisation.

Cert Mapping

Cert	Database scope
AWS SAA-C03	RDS, Aurora, DynamoDB, ElastiCache, Neptune, OpenSearch
Azure AZ-104 / AZ-305	Azure SQL, Cosmos DB (multi-API), Cache for Redis
GCP PCA / PDE	Cloud SQL, Spanner, Bigtable, Firestore, BigQuery

Decision Heuristic

Start with Postgres unless you have a specific reason not to.
If access is "given key, get value" at huge QPS, add a key-value store as a cache or primary.
If data shape is fluid documents, consider a document DB.
If you have time-series / event-stream / massive write fan-in, consider wide-column.
If you have a true graph problem (multi-hop traversals are central), use a graph DB.
For analytics, use a warehouse / lakehouse — not the operational store.

The next two lessons take whichever database you picked and ask: how do we replicate it for availability, and shard it for scale?