Joins, Aggregations, and Window Functions — SQL and Databases Fundamentals | CertQnA

Real queries pull data from multiple tables and produce summaries, rankings, and trends. Joins, aggregates, and window functions are how.

JOINs

-- Sample tables
-- users(id, name)
-- posts(id, user_id, title, published_at)

SELECT u.name, p.title
FROM users AS u
INNER JOIN posts AS p ON p.user_id = u.id;

Types of joins:

Join	Returns
`INNER JOIN`	Only rows that match in both tables (the default — `INNER` is optional)
`LEFT JOIN` (`LEFT OUTER`)	All rows from the left table; NULLs where the right has no match
`RIGHT JOIN`	Mirror of LEFT — rarely needed; just swap the tables
`FULL OUTER JOIN`	All rows from both, NULLs where the other has no match
`CROSS JOIN`	Cartesian product — every row of A combined with every row of B

LEFT JOIN example: users with their post counts

SELECT u.name, COUNT(p.id) AS post_count
FROM users AS u
LEFT JOIN posts AS p ON p.user_id = u.id
GROUP BY u.id, u.name
ORDER BY post_count DESC;

LEFT JOIN ensures users with no posts still appear in the result, with post_count of 0. COUNT(p.id) counts non-null rows, so users with no posts get 0; COUNT(*) would give 1 (the user row) — a common bug.

Joining more than two tables

SELECT u.name, p.title, t.name AS tag
FROM users u
JOIN posts p      ON p.user_id = u.id
JOIN post_tags pt ON pt.post_id = p.id
JOIN tags t       ON t.id = pt.tag_id
WHERE t.name = 'sql';

Aggregate Functions

SELECT COUNT(*) FROM users;
SELECT COUNT(DISTINCT country) FROM users;
SELECT SUM(amount), AVG(amount), MIN(amount), MAX(amount) FROM orders;
SELECT STRING_AGG(name, ', ') FROM users;       -- Postgres
-- GROUP_CONCAT in MySQL

Without GROUP BY, an aggregate collapses the entire table to one row. With GROUP BY, it produces one row per group.

GROUP BY

SELECT country, COUNT(*) AS user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;

SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS signups
FROM users
GROUP BY 1
ORDER BY 1;

Rule: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. Otherwise the database doesn't know which value to pick from the group.

HAVING

WHERE filters rows; HAVING filters groups after aggregation:

SELECT country, COUNT(*) AS user_count
FROM users
WHERE created_at > '2024-01-01'    -- filter rows first
GROUP BY country
HAVING COUNT(*) >= 100              -- then filter groups
ORDER BY user_count DESC;

Window Functions

Window functions compute aggregates without collapsing rows. They're the gateway to advanced analytics in plain SQL.

SELECT
    name,
    country,
    score,
    RANK() OVER (PARTITION BY country ORDER BY score DESC) AS rank_in_country
FROM players;

The OVER clause defines a "window" of rows the function looks at. PARTITION BY is like a GROUP BY just for that function.

Function	What it does
`ROW_NUMBER()`	Sequential 1, 2, 3, ...
`RANK()`	Ties get the same rank; gaps after
`DENSE_RANK()`	Ties get the same rank; no gaps
`LAG(col, n)` / `LEAD(col, n)`	Previous / next row's value
`SUM/AVG/COUNT(...) OVER (...)`	Running totals, moving averages

Running total

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Top N per group

WITH ranked AS (
    SELECT
        country,
        name,
        score,
        ROW_NUMBER() OVER (PARTITION BY country ORDER BY score DESC) AS rn
    FROM players
)
SELECT country, name, score
FROM ranked
WHERE rn <= 3;

"Top 3 players per country" — a question almost every analytics dashboard eventually asks.

Common Table Expressions (CTEs)

WITH introduces a named subquery. Use them to make complex queries readable:

WITH monthly_signups AS (
    SELECT DATE_TRUNC('month', created_at) AS month, COUNT(*) AS signups
    FROM users
    GROUP BY 1
),
monthly_revenue AS (
    SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
    FROM orders
    WHERE status = 'paid'
    GROUP BY 1
)
SELECT s.month, s.signups, COALESCE(r.revenue, 0) AS revenue
FROM monthly_signups s
LEFT JOIN monthly_revenue r USING (month)
ORDER BY s.month;

CTEs read top-down — much clearer than nested subqueries. Modern databases optimise them as well as inline subqueries in nearly all cases.

Subqueries

-- Scalar subquery
SELECT name,
       (SELECT COUNT(*) FROM posts p WHERE p.user_id = u.id) AS post_count
FROM users u;

-- IN with subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);

-- EXISTS — usually faster than IN for big subqueries
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000
);

UNION and UNION ALL

SELECT id, 'user' AS kind FROM users
UNION ALL
SELECT id, 'admin' AS kind FROM admins;

UNION deduplicates; UNION ALL keeps duplicates and is faster. Both require the same column list and types in each query.

A Mental Model

When you stare at a tricky query, build it in stages:

Pick the right tables and join shape
Filter with WHERE
Decide grouping
Add aggregates / window functions
Sort and paginate

If a query gets unwieldy, refactor into CTEs. A 200-line monster of nested subqueries is almost always a half-dozen 20-line CTEs in disguise.