Skip to content
5 min read·Lesson 1 of 10

What Is a Database and Why Use One?

Understand what a database actually is, why we use one instead of files, and the difference between OLTP, OLAP, and data warehouses.

If you're going to work with cloud or data anything, databases are the foundation. This first lesson is the conceptual ground floor: what a database is, what problems it solves, and the high-level shape of the landscape.

What Is a Database?

A database is organised, durable storage for data, with the ability to query, modify, and protect that data while many users and processes are using it at the same time. Compared to "just put it in files":

  • Querying: ask declarative questions ("all orders over $100 from last month") instead of writing custom code to scan files.
  • Concurrency: many readers and writers at once without corrupting each other's work.
  • Durability: survive crashes and power loss without losing committed data.
  • Integrity: enforce constraints ("an order must reference an existing customer").
  • Security: per-user access control, encryption, audit logs.

The software that provides all of this is called a Database Management System (DBMS). Examples: Postgres, MySQL, SQLite, SQL Server, Oracle, MongoDB, Redis, Cassandra, DynamoDB.

Relational Databases

The dominant model. Data lives in tables (rows and columns) with a defined schema — every row has the same columns with the same types. You query and modify them with SQL (Structured Query Language).

idnameemailcreated_at
1Alexalex@example.com2024-01-15
2Samsam@example.com2024-02-03
SELECT name, email FROM users WHERE created_at > '2024-01-01';

Why has this model dominated for fifty years? Because tables compose well: you can split data across many tables and combine them with JOINs, the engine optimises queries for you, and the rules (constraints, transactions) are well-understood.

The Vocabulary

TableA named collection of rows with the same shape
ColumnA named, typed field in a table (e.g., email VARCHAR(255))
Row / RecordA single entry in a table
SchemaThe structure: tables, columns, types, constraints
Primary keyColumn(s) that uniquely identify a row (often id)
Foreign keyA column that references the primary key of another table
IndexA separate data structure that speeds up lookups on a column
TransactionA group of statements that succeed or fail together
QueryA statement (usually SQL) that asks the DB for data

OLTP vs OLAP

OLTP — OperationalOLAP — Analytical
WorkloadMany small reads/writes per secondFew huge scans across millions of rows
Example"Place an order" / "Update profile""Total revenue by region last quarter"
StorageRow-orientedColumn-oriented
EnginesPostgres, MySQL, SQL ServerSnowflake, BigQuery, Redshift, ClickHouse, DuckDB
Used byApps and servicesAnalysts, dashboards, ML pipelines

Most products run both: an OLTP database for the live application, plus a data warehouse that periodically receives copies of the operational data for analysis. The pipeline that copies data is called ETL (Extract, Transform, Load) or ELT.

SQL vs NoSQL

NoSQL (originally "not only SQL") is an umbrella for non-relational databases that became popular in the 2000s for use cases the relational model handled poorly:

  • Key-value stores — Redis, Memcached, DynamoDB. Lookup by key, very fast.
  • Document stores — MongoDB, CouchDB, DocumentDB. Store JSON-like documents with flexible schemas.
  • Wide-column stores — Cassandra, HBase, ScyllaDB, Bigtable. Massive scale across many servers.
  • Graph databases — Neo4j, Neptune. First-class relationships.
  • Search engines — Elasticsearch, OpenSearch. Full-text and analytic search.

The pendulum has swung back somewhat — modern Postgres handles JSON, full-text search, and arrays well. Still, NoSQL has clear wins for very-large-scale and very-flexible-schema use cases.

What Databases Buy You: ACID

The classic guarantees of relational databases:

  • Atomicity — a transaction either completely happens or completely doesn't
  • Consistency — committed data follows defined rules and constraints
  • Isolation — concurrent transactions don't see each other's intermediate state
  • Durability — once committed, data survives crashes

These properties are why your bank uses a relational DB for account balances. Many NoSQL stores trade some of these for scale (eventual consistency).

Don't Build Your Own

Storage looks deceptively simple — "just write to a file" — until you need concurrent writers, crash recovery, indexes, joins, and a query optimiser. Decades of engineering have gone into Postgres and MySQL. Use them.

What's Next

The rest of this course goes deep on relational databases: how to design tables (modelling), how to query them (SQL), how to make them fast (indexes), and how to run them in production (cloud services, backups, replication, security). Toward the end we'll widen back out to NoSQL and pick the right tool for each job.

Key Takeaways

  • A database is organised, queryable, durable storage with concurrency and consistency guarantees.
  • A DBMS (database management system) is the software that runs the database (Postgres, MySQL, SQLite, etc.).
  • Relational databases store rows in tables with strict schemas and use SQL.
  • OLTP serves operational reads/writes; OLAP serves analytics on large data sets.
  • You almost never want to roll your own database; pick a proven engine and learn it well.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →