Real queries pull data from multiple tables and produce summaries, rankings, and trends. Joins, aggregates, and window functions are how.
JOINs
-- Sample tables
-- users(id, name)
-- posts(id, user_id, title, published_at)
SELECT u.name, p.title
FROM users AS u
INNER JOIN posts AS p ON p.user_id = u.id;
Types of joins:
| Join | Returns |
|---|---|
INNER JOIN | Only rows that match in both tables (the default — INNER is optional) |
LEFT JOIN (LEFT OUTER) | All rows from the left table; NULLs where the right has no match |
RIGHT JOIN | Mirror of LEFT — rarely needed; just swap the tables |
FULL OUTER JOIN | All rows from both, NULLs where the other has no match |
CROSS JOIN | Cartesian product — every row of A combined with every row of B |
LEFT JOIN example: users with their post counts
SELECT u.name, COUNT(p.id) AS post_count
FROM users AS u
LEFT JOIN posts AS p ON p.user_id = u.id
GROUP BY u.id, u.name
ORDER BY post_count DESC;
LEFT JOIN ensures users with no posts still appear in the result, with post_count of 0. COUNT(p.id) counts non-null rows, so users with no posts get 0; COUNT(*) would give 1 (the user row) — a common bug.
Joining more than two tables
SELECT u.name, p.title, t.name AS tag
FROM users u
JOIN posts p ON p.user_id = u.id
JOIN post_tags pt ON pt.post_id = p.id
JOIN tags t ON t.id = pt.tag_id
WHERE t.name = 'sql';
Aggregate Functions
SELECT COUNT(*) FROM users;
SELECT COUNT(DISTINCT country) FROM users;
SELECT SUM(amount), AVG(amount), MIN(amount), MAX(amount) FROM orders;
SELECT STRING_AGG(name, ', ') FROM users; -- Postgres
-- GROUP_CONCAT in MySQL
Without GROUP BY, an aggregate collapses the entire table to one row. With GROUP BY, it produces one row per group.
GROUP BY
SELECT country, COUNT(*) AS user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;
SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS signups
FROM users
GROUP BY 1
ORDER BY 1;
Rule: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. Otherwise the database doesn't know which value to pick from the group.
HAVING
WHERE filters rows; HAVING filters groups after aggregation:
SELECT country, COUNT(*) AS user_count
FROM users
WHERE created_at > '2024-01-01' -- filter rows first
GROUP BY country
HAVING COUNT(*) >= 100 -- then filter groups
ORDER BY user_count DESC;
Window Functions
Window functions compute aggregates without collapsing rows. They're the gateway to advanced analytics in plain SQL.
SELECT
name,
country,
score,
RANK() OVER (PARTITION BY country ORDER BY score DESC) AS rank_in_country
FROM players;
The OVER clause defines a "window" of rows the function looks at. PARTITION BY is like a GROUP BY just for that function.
| Function | What it does |
|---|---|
ROW_NUMBER() | Sequential 1, 2, 3, ... |
RANK() | Ties get the same rank; gaps after |
DENSE_RANK() | Ties get the same rank; no gaps |
LAG(col, n) / LEAD(col, n) | Previous / next row's value |
SUM/AVG/COUNT(...) OVER (...) | Running totals, moving averages |
Running total
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
Top N per group
WITH ranked AS (
SELECT
country,
name,
score,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY score DESC) AS rn
FROM players
)
SELECT country, name, score
FROM ranked
WHERE rn <= 3;
"Top 3 players per country" — a question almost every analytics dashboard eventually asks.
Common Table Expressions (CTEs)
WITH introduces a named subquery. Use them to make complex queries readable:
WITH monthly_signups AS (
SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS signups
FROM users
GROUP BY 1
),
monthly_revenue AS (
SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY 1
)
SELECT s.month, s.signups, COALESCE(r.revenue, 0) AS revenue
FROM monthly_signups s
LEFT JOIN monthly_revenue r USING (month)
ORDER BY s.month;
CTEs read top-down — much clearer than nested subqueries. Modern databases optimise them as well as inline subqueries in nearly all cases.
Subqueries
-- Scalar subquery
SELECT name,
(SELECT COUNT(*) FROM posts p WHERE p.user_id = u.id) AS post_count
FROM users u;
-- IN with subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);
-- EXISTS — usually faster than IN for big subqueries
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000
);
UNION and UNION ALL
SELECT id, 'user' AS kind FROM users
UNION ALL
SELECT id, 'admin' AS kind FROM admins;
UNION deduplicates; UNION ALL keeps duplicates and is faster. Both require the same column list and types in each query.
A Mental Model
When you stare at a tricky query, build it in stages:
- Pick the right tables and join shape
- Filter with
WHERE - Decide grouping
- Add aggregates / window functions
- Sort and paginate
If a query gets unwieldy, refactor into CTEs. A 200-line monster of nested subqueries is almost always a half-dozen 20-line CTEs in disguise.