Before you write a single SQL query, you have to design the tables. Good schema design is the difference between a database that scales gracefully and one that fights you for every feature.
Entities and Relationships
Start with the real-world things your app cares about — these are entities. For a blog: users, posts, comments, tags. Each entity becomes a table; each instance of an entity becomes a row.
Then think about how they relate:
- One-to-many: a user has many posts; a post belongs to one user. The "many" side carries a foreign key.
- Many-to-many: a post has many tags; a tag is on many posts. Use a third join table.
- One-to-one: rare; usually a sign two tables should be one.
A Worked Example
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
title VARCHAR(200) NOT NULL,
body TEXT NOT NULL,
published_at TIMESTAMPTZ
);
CREATE TABLE tags (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL
);
CREATE TABLE post_tags (
post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
tag_id BIGINT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (post_id, tag_id)
);
Things to notice:
id BIGSERIAL PRIMARY KEY— auto-incrementing surrogate primary key.UNIQUEconstraint on email prevents duplicates.REFERENCES users(id)creates a foreign key.ON DELETE CASCADEremoves child rows when the parent is deleted.NOT NULLon required columns — be intentional about what's allowed to be missing.- The join table
post_tagshas a composite primary key of both columns — preventing the same tag being added twice.
Primary Keys: Surrogate vs Natural
| Approach | Example | Trade-offs |
|---|---|---|
| Auto-increment integer | BIGSERIAL | Small, fast, simple. Reveals row count and order. |
| UUID | uuid_generate_v4() / UUIDv7 | No coordination across servers; opaque; larger (16 bytes). |
| Natural key | Email, ISBN, country code | Avoid for anything that might change. |
Most modern designs use surrogate keys (auto-increment or UUIDv7) for the primary key and add unique constraints on natural identifiers separately. UUIDv7 is time-ordered, giving you the best of both worlds for distributed systems.
Foreign Keys
A foreign key declares "this column must reference an existing row in that table." The DB enforces it for you — you cannot insert a post with a user_id that doesn't exist.
ON DELETE options when the referenced row is deleted:
CASCADE— delete the dependent rows tooRESTRICT/NO ACTION— prevent the parent deleteSET NULL— leave the row, set the FK column to NULLSET DEFAULT— set to a default value
Pick deliberately. Comments under a deleted post: cascade. An author of a deleted comment: probably set null and leave the comment as "anonymous".
Data Types: Pick Narrowly
| Type | Use for |
|---|---|
INT (4 bytes) | Counts, small IDs (max ~2 billion) |
BIGINT (8 bytes) | IDs, anything that may grow large |
NUMERIC(p,s) / DECIMAL | Money — never use FLOAT for money |
VARCHAR(N) / TEXT | Strings; in Postgres they're equivalent — pick TEXT and add CHECK constraints if needed |
BOOLEAN | true / false |
TIMESTAMPTZ | Always store time with timezone (UTC under the hood) |
DATE | Date without time |
JSONB (Postgres) | Flexible attributes; queryable, indexable |
UUID | Universally unique IDs |
Avoid FLOAT/REAL for currency — binary floating point can't represent 0.10 exactly, leading to rounding bugs.
Normalization
Normalization is a set of rules for eliminating redundancy. The first three forms cover almost everything you'll ever need:
1NF — Atomic columns
Each cell holds a single value. Don't store "red,blue,green" in one column — make it three rows in a related table (or a tags array column with explicit semantics).
2NF — No partial dependencies
If you have a composite key, every non-key column must depend on the whole key. Mostly relevant when using composite keys.
3NF — No transitive dependencies
Non-key columns should depend only on the primary key. If you store a customer's country_id and country_name in the orders table, the name depends on the country, not the order — move it to a countries table.
For most apps, "design tables around entities, give each table a primary key, use foreign keys for relationships, don't repeat data" gets you to 3NF without thinking about it.
Denormalization
Sometimes you deliberately duplicate data — a precomputed sum, a cached "post count" on the user table — to avoid expensive queries. This is denormalization.
- Common in analytical / OLAP databases (a single big "fact" table joined to "dimension" tables)
- Common in NoSQL where joins are expensive or unavailable
- Always introduces the risk of data drifting out of sync — be deliberate, document it
Schema Migrations
Schemas evolve. Don't run ALTER TABLE by hand in production. Use a migration tool that records each change as a versioned, reviewable file:
- Flyway, Liquibase — language-neutral, SQL-first
- Alembic (Python / SQLAlchemy)
- Prisma Migrate, Drizzle Kit (Node)
- Active Record migrations (Rails)
Each migration goes through code review and CI. You can roll forward and backward predictably across environments.
A Quick Sanity Check
Before you create the table, ask:
- What are the entities?
- What's the primary key for each?
- What are the relationships and what do
ON DELETErules should be? - What columns must be
NOT NULL? - What needs to be unique beyond the primary key?
- Will I query this column with
WHEREa lot? — that's a future index.
Spend twenty minutes on these questions and you'll save weeks of refactoring later.