Skip to content
6 min read·Lesson 6 of 10

Indexes and Query Performance

Understand how indexes work, when they help, when they hurt, and how to read EXPLAIN plans to find slow queries.

An index is a data structure the database keeps alongside a table to make lookups fast. Without indexes, every WHERE clause is a sequential scan of the whole table; with the right one, it's microseconds regardless of table size.

How a B-Tree Index Works

The default index type in nearly every relational database is the B-tree (balanced tree). Imagine a sorted phone book:

  1. The root and inner nodes point to ranges
  2. The leaves hold the indexed values plus pointers (row ids) to the actual table rows
  3. Looking up a value is O(log n) — a few page reads even for billions of rows

B-trees support equality and range queries: =, <, <=, >, >=, BETWEEN, prefix LIKE 'foo%', and ORDER BY on the indexed column.

Creating Indexes

CREATE INDEX idx_users_email ON users (email);

-- Unique index — also enforces uniqueness
CREATE UNIQUE INDEX idx_users_email_uq ON users (email);

-- Composite index
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

-- Partial index — only a subset of rows
CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Postgres: build without locking writers
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

Primary keys and unique constraints automatically create indexes — you don't need to add them again.

Composite Indexes: The Leftmost Rule

An index on (a, b, c) can be used by queries that filter on:

  • a
  • a AND b
  • a AND b AND c

It cannot be used efficiently for queries on b alone or c alone — the leftmost column must appear. Order columns by:

  1. Equality predicates first (the columns most queries match exactly)
  2. Then the range/sort column
  3. Most selective columns generally earlier

Example: (user_id, created_at) serves "all orders for a user, newest first" beautifully.

Covering Indexes

An index that includes every column the query needs can answer the query without touching the table at all — an "index-only scan":

-- Postgres
CREATE INDEX idx_orders_cover
ON orders (user_id, created_at)
INCLUDE (amount, status);

Trade-off: the index is bigger and slower to write. Worth it for hot queries on huge tables.

EXPLAIN and EXPLAIN ANALYZE

The query planner picks how to execute your query. EXPLAIN shows the plan; EXPLAIN ANALYZE runs the query and reports actual timings:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;

You're looking for:

NodeMeaning
Index ScanUsed an index — usually good
Index Only ScanAnswered from the index alone — best case
Bitmap Index ScanCombining multiple indexes — fine for large result sets
Seq ScanReading the whole table — bad for big tables, fine for small ones
Nested LoopFor each row in A, look up matches in B — great for small A + indexed B
Hash JoinBuild a hash from one side, probe with the other — good for medium-sized joins
Merge JoinBoth sides sorted, walked together — good for big sorted joins

Pay attention to "rows" estimated vs actual. A 10x mismatch means statistics are stale — run ANALYZE (Postgres) or ANALYZE TABLE (MySQL).

When Indexes Don't Help

  • Tiny tables — sequential scan beats index lookup overhead
  • Queries returning a large fraction of the table — sequential scan is better
  • Functions on the column: WHERE LOWER(email) = '...' ignores an index on email. Create an expression index or normalize the data.
  • Type mismatches: comparing varchar to int may skip indexes. Cast carefully.
  • Leading wildcard LIKE: '%foo' is unindexable; 'foo%' is fine.
  • NOT, !=, and OR across columns often defeat indexes. Refactor to UNION or partial indexes.

Cost of Indexes

Every INSERT, UPDATE (of indexed columns), and DELETE must update every index on the table. So:

  • Don't create indexes "just in case." Remove unused ones.
  • Write-heavy tables (logs, events) suffer the most from over-indexing.
  • Most databases expose unused-index views (pg_stat_user_indexes in Postgres, sys.dm_db_index_usage_stats in SQL Server) — use them to prune.

Other Index Types (Postgres)

TypeUse for
B-treeDefault — equality and range
HashEquality only — rarely worth it over B-tree
GINFull-text search, JSONB, array containment
GiSTGeometric, ranges, full-text
BRINHuge tables sorted by insert order (e.g. time-series logs) — tiny on disk

A Practical Tuning Workflow

  1. Identify the slow query — log slow queries, watch APM dashboards.
  2. Run EXPLAIN ANALYZE with realistic parameters.
  3. Look for big sequential scans, mis-estimated row counts, missing index hits.
  4. Add the smallest index that helps. Re-run EXPLAIN ANALYZE.
  5. Verify production after deploy — slow query log should drop.
  6. Periodically review indexes; drop any unused ones.

Other Performance Levers

  • Connection pooling (PgBouncer, ProxySQL) — connections are expensive; reuse them.
  • Caching (Redis) — for hot read-only queries, skip the DB entirely.
  • Materialized views — store the result of an expensive query and refresh on a schedule.
  • Partitioning — split a huge table by date or tenant; the planner only touches relevant partitions.
  • Hardware — more RAM lets the DB cache more pages; SSDs cut latency 100x.

Indexes are the first lever you reach for, but they're not the last. The fastest query is the one you don't run.

Key Takeaways

  • Indexes are sorted lookup structures (usually B-trees) that turn full scans into fast lookups.
  • Composite indexes work left-to-right — order columns by selectivity and query shape.
  • Every index speeds reads but slows writes and uses disk; index deliberately.
  • EXPLAIN ANALYZE reveals whether the planner uses your index and where time is spent.
  • A few well-chosen indexes outperform many redundant ones.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →