An index is a data structure the database keeps alongside a table to make lookups fast. Without indexes, every WHERE clause is a sequential scan of the whole table; with the right one, it's microseconds regardless of table size.
How a B-Tree Index Works
The default index type in nearly every relational database is the B-tree (balanced tree). Imagine a sorted phone book:
- The root and inner nodes point to ranges
- The leaves hold the indexed values plus pointers (row ids) to the actual table rows
- Looking up a value is O(log n) — a few page reads even for billions of rows
B-trees support equality and range queries: =, <, <=, >, >=, BETWEEN, prefix LIKE 'foo%', and ORDER BY on the indexed column.
Creating Indexes
CREATE INDEX idx_users_email ON users (email);
-- Unique index — also enforces uniqueness
CREATE UNIQUE INDEX idx_users_email_uq ON users (email);
-- Composite index
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);
-- Partial index — only a subset of rows
CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status = 'active';
-- Expression index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Postgres: build without locking writers
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
Primary keys and unique constraints automatically create indexes — you don't need to add them again.
Composite Indexes: The Leftmost Rule
An index on (a, b, c) can be used by queries that filter on:
aa AND ba AND b AND c
It cannot be used efficiently for queries on b alone or c alone — the leftmost column must appear. Order columns by:
- Equality predicates first (the columns most queries match exactly)
- Then the range/sort column
- Most selective columns generally earlier
Example: (user_id, created_at) serves "all orders for a user, newest first" beautifully.
Covering Indexes
An index that includes every column the query needs can answer the query without touching the table at all — an "index-only scan":
-- Postgres
CREATE INDEX idx_orders_cover
ON orders (user_id, created_at)
INCLUDE (amount, status);
Trade-off: the index is bigger and slower to write. Worth it for hot queries on huge tables.
EXPLAIN and EXPLAIN ANALYZE
The query planner picks how to execute your query. EXPLAIN shows the plan; EXPLAIN ANALYZE runs the query and reports actual timings:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;
You're looking for:
| Node | Meaning |
|---|---|
| Index Scan | Used an index — usually good |
| Index Only Scan | Answered from the index alone — best case |
| Bitmap Index Scan | Combining multiple indexes — fine for large result sets |
| Seq Scan | Reading the whole table — bad for big tables, fine for small ones |
| Nested Loop | For each row in A, look up matches in B — great for small A + indexed B |
| Hash Join | Build a hash from one side, probe with the other — good for medium-sized joins |
| Merge Join | Both sides sorted, walked together — good for big sorted joins |
Pay attention to "rows" estimated vs actual. A 10x mismatch means statistics are stale — run ANALYZE (Postgres) or ANALYZE TABLE (MySQL).
When Indexes Don't Help
- Tiny tables — sequential scan beats index lookup overhead
- Queries returning a large fraction of the table — sequential scan is better
- Functions on the column:
WHERE LOWER(email) = '...'ignores an index onemail. Create an expression index or normalize the data. - Type mismatches: comparing
varchartointmay skip indexes. Cast carefully. - Leading wildcard
LIKE:'%foo'is unindexable;'foo%'is fine. NOT,!=, andORacross columns often defeat indexes. Refactor toUNIONor partial indexes.
Cost of Indexes
Every INSERT, UPDATE (of indexed columns), and DELETE must update every index on the table. So:
- Don't create indexes "just in case." Remove unused ones.
- Write-heavy tables (logs, events) suffer the most from over-indexing.
- Most databases expose unused-index views (
pg_stat_user_indexesin Postgres,sys.dm_db_index_usage_statsin SQL Server) — use them to prune.
Other Index Types (Postgres)
| Type | Use for |
|---|---|
| B-tree | Default — equality and range |
| Hash | Equality only — rarely worth it over B-tree |
| GIN | Full-text search, JSONB, array containment |
| GiST | Geometric, ranges, full-text |
| BRIN | Huge tables sorted by insert order (e.g. time-series logs) — tiny on disk |
A Practical Tuning Workflow
- Identify the slow query — log slow queries, watch APM dashboards.
- Run
EXPLAIN ANALYZEwith realistic parameters. - Look for big sequential scans, mis-estimated row counts, missing index hits.
- Add the smallest index that helps. Re-run
EXPLAIN ANALYZE. - Verify production after deploy — slow query log should drop.
- Periodically review indexes; drop any unused ones.
Other Performance Levers
- Connection pooling (PgBouncer, ProxySQL) — connections are expensive; reuse them.
- Caching (Redis) — for hot read-only queries, skip the DB entirely.
- Materialized views — store the result of an expensive query and refresh on a schedule.
- Partitioning — split a huge table by date or tenant; the planner only touches relevant partitions.
- Hardware — more RAM lets the DB cache more pages; SSDs cut latency 100x.
Indexes are the first lever you reach for, but they're not the last. The fastest query is the one you don't run.