Skip to content
5 min read·Lesson 4 of 10

Filtering, Sorting, and Pagination

List endpoints scale or fall over depending on a few small decisions: pagination strategy, filter syntax, and sort conventions.

The most common production outage you will trigger as an API designer is the unbounded list endpoint. The fix is simple, but you must build pagination from the start — retrofitting it on a public API breaks every client.

Why Pagination Matters

An endpoint that returns "all orders" works fine when the demo customer has 12. It melts down when one real customer has 4 million. The database query is slow, the network payload is huge, the client cannot render the result, and the request times out before any of that happens.

  • Always default to a small page size (25–50).
  • Cap maximum page size (100–1000).
  • Document both clearly.

Offset Pagination

GET /orders?limit=25&offset=50
  • + Trivial to implement.
  • + Caller can jump to any page directly.
  • − Slow for large offsets — the database often has to scan and skip.
  • − Items inserted or deleted between pages cause duplicates or gaps.

Use it for small datasets and admin tools. Avoid for scrolling timelines and big tables.

Cursor (Keyset) Pagination

Encode "where I am" as an opaque cursor that the server can resume from. Typically the cursor is the last seen sort key plus the last seen ID.

GET /orders?limit=25
→ {
    "data": [ ... ],
    "next_cursor": "eyJpZCI6ICJvcmRfMTAwIn0="
  }

GET /orders?limit=25&cursor=eyJpZCI6ICJvcmRfMTAwIn0=
→ next page
  • + Fast at any depth — query becomes WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 25.
  • + Stable under inserts and deletes.
  • − Cannot jump to "page 47" — only forward (and sometimes backward).

This is the right default for most modern APIs.

Page Tokens (Google-Style)

Google APIs use a similar pattern with named token fields:

GET /orders?pageSize=25&pageToken=ABC...
→ { "items": [...], "nextPageToken": "DEF..." }

Equivalent to cursors but with bigger naming convention. Pick one and stick with it.

Total Counts: Be Honest

Clients often want a total. The database may not give you one cheaply on a billion-row table. Options:

  • Don't return it. Acceptable for infinite-scroll UIs.
  • Return an approximate count from cached statistics.
  • Return an exact count only on filtered queries small enough to count fast.

Document which you do; never silently switch.

Sorting

Use a simple, stable convention:

GET /orders?sort=created_at,desc
GET /orders?sort=-created_at         (Stripe / GitHub style: leading - = desc)
GET /orders?sort=status,created_at,desc
  • Whitelist sortable fields — never let callers sort on arbitrary columns (no index, full scan).
  • Always tie-break with the primary key for stable ordering across pages.

Filtering

Most APIs converge on simple equality filters in query strings:

GET /orders?status=open
GET /orders?status=open&customer_id=cus_42
GET /orders?created_at[gte]=2025-01-01&created_at[lt]=2025-02-01

For richer filtering, three approaches scale:

  1. Bracket operators: created_at[gte]=.... Readable.
  2. Predefined filter object on POST: POST /orders/search with a JSON body. Bypasses URL length limits and lets you express complex queries.
  3. RSQL / FIQL: ?filter=status==open;amount=gt=1000. Standardised but less common.

Whatever you pick, document it once and apply it everywhere.

Field Selection (Sparse Fieldsets)

Let clients ask for only the fields they need to reduce payload size:

GET /orders?fields=id,status,amount

Useful for large objects on slow networks. GraphQL gives this for free; in REST you opt in.

Embedding / Expanding Relations

Avoid forcing N+1 client requests by allowing inline expansion of related resources:

GET /orders/100?expand=customer,line_items
→ {
    "id": "ord_100",
    "customer": { ... },
    "line_items": [ ... ]
  }

Whitelist what can be expanded; cap nesting depth.

Putting It Together

GET /orders?
  status=open&
  created_at[gte]=2025-01-01&
  sort=-created_at&
  limit=50&
  cursor=...&
  fields=id,status,amount&
  expand=customer

→ {
    "data": [ ... ],
    "next_cursor": "...",
    "has_more": true
  }

Five small conventions and your list endpoints scale from prototype to billion-row table without breaking changes.

Cert Mapping

CertPagination scope
AWS SAAAPI Gateway pagination; DynamoDB pagination tokens
AWS Data EngineerPagination in batch ingestion APIs

The next lesson moves to GraphQL, which approaches these problems differently.

Key Takeaways

  • Always paginate list endpoints — never return unbounded collections.
  • Cursor pagination beats offset pagination for large or changing datasets.
  • Standardise filter, sort, and field-selection query parameters.
  • Document defaults and limits explicitly.
  • Provide total counts only when the database can produce them cheaply.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →