If you're going to work with cloud or data anything, databases are the foundation. This first lesson is the conceptual ground floor: what a database is, what problems it solves, and the high-level shape of the landscape.
What Is a Database?
A database is organised, durable storage for data, with the ability to query, modify, and protect that data while many users and processes are using it at the same time. Compared to "just put it in files":
- Querying: ask declarative questions ("all orders over $100 from last month") instead of writing custom code to scan files.
- Concurrency: many readers and writers at once without corrupting each other's work.
- Durability: survive crashes and power loss without losing committed data.
- Integrity: enforce constraints ("an order must reference an existing customer").
- Security: per-user access control, encryption, audit logs.
The software that provides all of this is called a Database Management System (DBMS). Examples: Postgres, MySQL, SQLite, SQL Server, Oracle, MongoDB, Redis, Cassandra, DynamoDB.
Relational Databases
The dominant model. Data lives in tables (rows and columns) with a defined schema — every row has the same columns with the same types. You query and modify them with SQL (Structured Query Language).
| id | name | created_at | |
|---|---|---|---|
| 1 | Alex | alex@example.com | 2024-01-15 |
| 2 | Sam | sam@example.com | 2024-02-03 |
SELECT name, email FROM users WHERE created_at > '2024-01-01';
Why has this model dominated for fifty years? Because tables compose well: you can split data across many tables and combine them with JOINs, the engine optimises queries for you, and the rules (constraints, transactions) are well-understood.
The Vocabulary
| Table | A named collection of rows with the same shape |
| Column | A named, typed field in a table (e.g., email VARCHAR(255)) |
| Row / Record | A single entry in a table |
| Schema | The structure: tables, columns, types, constraints |
| Primary key | Column(s) that uniquely identify a row (often id) |
| Foreign key | A column that references the primary key of another table |
| Index | A separate data structure that speeds up lookups on a column |
| Transaction | A group of statements that succeed or fail together |
| Query | A statement (usually SQL) that asks the DB for data |
OLTP vs OLAP
| OLTP — Operational | OLAP — Analytical | |
|---|---|---|
| Workload | Many small reads/writes per second | Few huge scans across millions of rows |
| Example | "Place an order" / "Update profile" | "Total revenue by region last quarter" |
| Storage | Row-oriented | Column-oriented |
| Engines | Postgres, MySQL, SQL Server | Snowflake, BigQuery, Redshift, ClickHouse, DuckDB |
| Used by | Apps and services | Analysts, dashboards, ML pipelines |
Most products run both: an OLTP database for the live application, plus a data warehouse that periodically receives copies of the operational data for analysis. The pipeline that copies data is called ETL (Extract, Transform, Load) or ELT.
SQL vs NoSQL
NoSQL (originally "not only SQL") is an umbrella for non-relational databases that became popular in the 2000s for use cases the relational model handled poorly:
- Key-value stores — Redis, Memcached, DynamoDB. Lookup by key, very fast.
- Document stores — MongoDB, CouchDB, DocumentDB. Store JSON-like documents with flexible schemas.
- Wide-column stores — Cassandra, HBase, ScyllaDB, Bigtable. Massive scale across many servers.
- Graph databases — Neo4j, Neptune. First-class relationships.
- Search engines — Elasticsearch, OpenSearch. Full-text and analytic search.
The pendulum has swung back somewhat — modern Postgres handles JSON, full-text search, and arrays well. Still, NoSQL has clear wins for very-large-scale and very-flexible-schema use cases.
What Databases Buy You: ACID
The classic guarantees of relational databases:
- Atomicity — a transaction either completely happens or completely doesn't
- Consistency — committed data follows defined rules and constraints
- Isolation — concurrent transactions don't see each other's intermediate state
- Durability — once committed, data survives crashes
These properties are why your bank uses a relational DB for account balances. Many NoSQL stores trade some of these for scale (eventual consistency).
Don't Build Your Own
Storage looks deceptively simple — "just write to a file" — until you need concurrent writers, crash recovery, indexes, joins, and a query optimiser. Decades of engineering have gone into Postgres and MySQL. Use them.
What's Next
The rest of this course goes deep on relational databases: how to design tables (modelling), how to query them (SQL), how to make them fast (indexes), and how to run them in production (cloud services, backups, replication, security). Toward the end we'll widen back out to NoSQL and pick the right tool for each job.