Indexes and Query Performance — SQL and Databases Fundamentals | CertQnA

An index is a data structure the database keeps alongside a table to make lookups fast. Without indexes, every WHERE clause is a sequential scan of the whole table; with the right one, it's microseconds regardless of table size.

How a B-Tree Index Works

The default index type in nearly every relational database is the B-tree (balanced tree). Imagine a sorted phone book:

The root and inner nodes point to ranges
The leaves hold the indexed values plus pointers (row ids) to the actual table rows
Looking up a value is O(log n) — a few page reads even for billions of rows

B-trees support equality and range queries: =, <, <=, >, >=, BETWEEN, prefix LIKE 'foo%', and ORDER BY on the indexed column.

Creating Indexes

CREATE INDEX idx_users_email ON users (email);

-- Unique index — also enforces uniqueness
CREATE UNIQUE INDEX idx_users_email_uq ON users (email);

-- Composite index
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

-- Partial index — only a subset of rows
CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Postgres: build without locking writers
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

Primary keys and unique constraints automatically create indexes — you don't need to add them again.

Composite Indexes: The Leftmost Rule

An index on (a, b, c) can be used by queries that filter on:

a
a AND b
a AND b AND c

It cannot be used efficiently for queries on b alone or c alone — the leftmost column must appear. Order columns by:

Equality predicates first (the columns most queries match exactly)
Then the range/sort column
Most selective columns generally earlier

Example: (user_id, created_at) serves "all orders for a user, newest first" beautifully.

Covering Indexes

An index that includes every column the query needs can answer the query without touching the table at all — an "index-only scan":

-- Postgres
CREATE INDEX idx_orders_cover
ON orders (user_id, created_at)
INCLUDE (amount, status);

Trade-off: the index is bigger and slower to write. Worth it for hot queries on huge tables.

EXPLAIN and EXPLAIN ANALYZE

The query planner picks how to execute your query. EXPLAIN shows the plan; EXPLAIN ANALYZE runs the query and reports actual timings:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;

You're looking for:

Node	Meaning
Index Scan	Used an index — usually good
Index Only Scan	Answered from the index alone — best case
Bitmap Index Scan	Combining multiple indexes — fine for large result sets
Seq Scan	Reading the whole table — bad for big tables, fine for small ones
Nested Loop	For each row in A, look up matches in B — great for small A + indexed B
Hash Join	Build a hash from one side, probe with the other — good for medium-sized joins
Merge Join	Both sides sorted, walked together — good for big sorted joins

Pay attention to "rows" estimated vs actual. A 10x mismatch means statistics are stale — run ANALYZE (Postgres) or ANALYZE TABLE (MySQL).

When Indexes Don't Help

Tiny tables — sequential scan beats index lookup overhead
Queries returning a large fraction of the table — sequential scan is better
Functions on the column: WHERE LOWER(email) = '...' ignores an index on email. Create an expression index or normalize the data.
Type mismatches: comparing varchar to int may skip indexes. Cast carefully.
Leading wildcard LIKE: '%foo' is unindexable; 'foo%' is fine.
NOT, !=, and OR across columns often defeat indexes. Refactor to UNION or partial indexes.

Cost of Indexes

Every INSERT, UPDATE (of indexed columns), and DELETE must update every index on the table. So:

Don't create indexes "just in case." Remove unused ones.
Write-heavy tables (logs, events) suffer the most from over-indexing.
Most databases expose unused-index views (pg_stat_user_indexes in Postgres, sys.dm_db_index_usage_stats in SQL Server) — use them to prune.

Other Index Types (Postgres)

Type	Use for
B-tree	Default — equality and range
Hash	Equality only — rarely worth it over B-tree
GIN	Full-text search, JSONB, array containment
GiST	Geometric, ranges, full-text
BRIN	Huge tables sorted by insert order (e.g. time-series logs) — tiny on disk

A Practical Tuning Workflow

Identify the slow query — log slow queries, watch APM dashboards.
Run EXPLAIN ANALYZE with realistic parameters.
Look for big sequential scans, mis-estimated row counts, missing index hits.
Add the smallest index that helps. Re-run EXPLAIN ANALYZE.
Verify production after deploy — slow query log should drop.
Periodically review indexes; drop any unused ones.

Other Performance Levers

Connection pooling (PgBouncer, ProxySQL) — connections are expensive; reuse them.
Caching (Redis) — for hot read-only queries, skip the DB entirely.
Materialized views — store the result of an expensive query and refresh on a schedule.
Partitioning — split a huge table by date or tenant; the planner only touches relevant partitions.
Hardware — more RAM lets the DB cache more pages; SSDs cut latency 100x.

Indexes are the first lever you reach for, but they're not the last. The fastest query is the one you don't run.