Skip to content
7 min read·Lesson 4 of 10

Joins, Aggregations, and Window Functions

Combine data from multiple tables with JOINs, summarise it with GROUP BY and aggregate functions, and unlock advanced analytics with window functions.

Real queries pull data from multiple tables and produce summaries, rankings, and trends. Joins, aggregates, and window functions are how.

JOINs

-- Sample tables
-- users(id, name)
-- posts(id, user_id, title, published_at)

SELECT u.name, p.title
FROM users AS u
INNER JOIN posts AS p ON p.user_id = u.id;

Types of joins:

JoinReturns
INNER JOINOnly rows that match in both tables (the default — INNER is optional)
LEFT JOIN (LEFT OUTER)All rows from the left table; NULLs where the right has no match
RIGHT JOINMirror of LEFT — rarely needed; just swap the tables
FULL OUTER JOINAll rows from both, NULLs where the other has no match
CROSS JOINCartesian product — every row of A combined with every row of B

LEFT JOIN example: users with their post counts

SELECT u.name, COUNT(p.id) AS post_count
FROM users AS u
LEFT JOIN posts AS p ON p.user_id = u.id
GROUP BY u.id, u.name
ORDER BY post_count DESC;

LEFT JOIN ensures users with no posts still appear in the result, with post_count of 0. COUNT(p.id) counts non-null rows, so users with no posts get 0; COUNT(*) would give 1 (the user row) — a common bug.

Joining more than two tables

SELECT u.name, p.title, t.name AS tag
FROM users u
JOIN posts p      ON p.user_id = u.id
JOIN post_tags pt ON pt.post_id = p.id
JOIN tags t       ON t.id = pt.tag_id
WHERE t.name = 'sql';

Aggregate Functions

SELECT COUNT(*) FROM users;
SELECT COUNT(DISTINCT country) FROM users;
SELECT SUM(amount), AVG(amount), MIN(amount), MAX(amount) FROM orders;
SELECT STRING_AGG(name, ', ') FROM users;       -- Postgres
-- GROUP_CONCAT in MySQL

Without GROUP BY, an aggregate collapses the entire table to one row. With GROUP BY, it produces one row per group.

GROUP BY

SELECT country, COUNT(*) AS user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;

SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS signups
FROM users
GROUP BY 1
ORDER BY 1;

Rule: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. Otherwise the database doesn't know which value to pick from the group.

HAVING

WHERE filters rows; HAVING filters groups after aggregation:

SELECT country, COUNT(*) AS user_count
FROM users
WHERE created_at > '2024-01-01'    -- filter rows first
GROUP BY country
HAVING COUNT(*) >= 100              -- then filter groups
ORDER BY user_count DESC;

Window Functions

Window functions compute aggregates without collapsing rows. They're the gateway to advanced analytics in plain SQL.

SELECT
    name,
    country,
    score,
    RANK() OVER (PARTITION BY country ORDER BY score DESC) AS rank_in_country
FROM players;

The OVER clause defines a "window" of rows the function looks at. PARTITION BY is like a GROUP BY just for that function.

FunctionWhat it does
ROW_NUMBER()Sequential 1, 2, 3, ...
RANK()Ties get the same rank; gaps after
DENSE_RANK()Ties get the same rank; no gaps
LAG(col, n) / LEAD(col, n)Previous / next row's value
SUM/AVG/COUNT(...) OVER (...)Running totals, moving averages

Running total

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Top N per group

WITH ranked AS (
    SELECT
        country,
        name,
        score,
        ROW_NUMBER() OVER (PARTITION BY country ORDER BY score DESC) AS rn
    FROM players
)
SELECT country, name, score
FROM ranked
WHERE rn <= 3;

"Top 3 players per country" — a question almost every analytics dashboard eventually asks.

Common Table Expressions (CTEs)

WITH introduces a named subquery. Use them to make complex queries readable:

WITH monthly_signups AS (
    SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS signups
    FROM users
    GROUP BY 1
),
monthly_revenue AS (
    SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
    FROM orders
    WHERE status = 'paid'
    GROUP BY 1
)
SELECT s.month, s.signups, COALESCE(r.revenue, 0) AS revenue
FROM monthly_signups s
LEFT JOIN monthly_revenue r USING (month)
ORDER BY s.month;

CTEs read top-down — much clearer than nested subqueries. Modern databases optimise them as well as inline subqueries in nearly all cases.

Subqueries

-- Scalar subquery
SELECT name,
       (SELECT COUNT(*) FROM posts p WHERE p.user_id = u.id) AS post_count
FROM users u;

-- IN with subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);

-- EXISTS — usually faster than IN for big subqueries
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000
);

UNION and UNION ALL

SELECT id, 'user' AS kind FROM users
UNION ALL
SELECT id, 'admin' AS kind FROM admins;

UNION deduplicates; UNION ALL keeps duplicates and is faster. Both require the same column list and types in each query.

A Mental Model

When you stare at a tricky query, build it in stages:

  1. Pick the right tables and join shape
  2. Filter with WHERE
  3. Decide grouping
  4. Add aggregates / window functions
  5. Sort and paginate

If a query gets unwieldy, refactor into CTEs. A 200-line monster of nested subqueries is almost always a half-dozen 20-line CTEs in disguise.

Key Takeaways

  • INNER JOIN returns matched rows from both tables; LEFT JOIN keeps all rows from the left.
  • GROUP BY collapses rows into groups so aggregate functions (COUNT, SUM, AVG) can run.
  • HAVING filters groups; WHERE filters rows before grouping.
  • Window functions compute running totals, ranks, and moving averages without collapsing rows.
  • CTEs (WITH clauses) make complex queries readable.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →