Filtering, Sorting, and Pagination — API Design: REST and GraphQL | CertQnA

The most common production outage you will trigger as an API designer is the unbounded list endpoint. The fix is simple, but you must build pagination from the start — retrofitting it on a public API breaks every client.

Why Pagination Matters

An endpoint that returns "all orders" works fine when the demo customer has 12. It melts down when one real customer has 4 million. The database query is slow, the network payload is huge, the client cannot render the result, and the request times out before any of that happens.

Always default to a small page size (25–50).
Cap maximum page size (100–1000).
Document both clearly.

Offset Pagination

GET /orders?limit=25&offset=50

+ Trivial to implement.
+ Caller can jump to any page directly.
− Slow for large offsets — the database often has to scan and skip.
− Items inserted or deleted between pages cause duplicates or gaps.

Use it for small datasets and admin tools. Avoid for scrolling timelines and big tables.

Cursor (Keyset) Pagination

Encode "where I am" as an opaque cursor that the server can resume from. Typically the cursor is the last seen sort key plus the last seen ID.

GET /orders?limit=25
→ {
    "data": [ ... ],
    "next_cursor": "eyJpZCI6ICJvcmRfMTAwIn0="
  }

GET /orders?limit=25&cursor=eyJpZCI6ICJvcmRfMTAwIn0=
→ next page

+ Fast at any depth — query becomes WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 25.
+ Stable under inserts and deletes.
− Cannot jump to "page 47" — only forward (and sometimes backward).

This is the right default for most modern APIs.

Page Tokens (Google-Style)

Google APIs use a similar pattern with named token fields:

GET /orders?pageSize=25&pageToken=ABC...
→ { "items": [...], "nextPageToken": "DEF..." }

Equivalent to cursors but with bigger naming convention. Pick one and stick with it.

Total Counts: Be Honest

Clients often want a total. The database may not give you one cheaply on a billion-row table. Options:

Don't return it. Acceptable for infinite-scroll UIs.
Return an approximate count from cached statistics.
Return an exact count only on filtered queries small enough to count fast.

Document which you do; never silently switch.

Sorting

Use a simple, stable convention:

GET /orders?sort=created_at,desc
GET /orders?sort=-created_at         (Stripe / GitHub style: leading - = desc)
GET /orders?sort=status,created_at,desc

Whitelist sortable fields — never let callers sort on arbitrary columns (no index, full scan).
Always tie-break with the primary key for stable ordering across pages.

Filtering

Most APIs converge on simple equality filters in query strings:

GET /orders?status=open
GET /orders?status=open&customer_id=cus_42
GET /orders?created_at[gte]=2025-01-01&created_at[lt]=2025-02-01

For richer filtering, three approaches scale:

Bracket operators: created_at[gte]=.... Readable.
Predefined filter object on POST: POST /orders/search with a JSON body. Bypasses URL length limits and lets you express complex queries.
RSQL / FIQL: ?filter=status==open;amount=gt=1000. Standardised but less common.

Whatever you pick, document it once and apply it everywhere.

Field Selection (Sparse Fieldsets)

Let clients ask for only the fields they need to reduce payload size:

GET /orders?fields=id,status,amount

Useful for large objects on slow networks. GraphQL gives this for free; in REST you opt in.

Embedding / Expanding Relations

Avoid forcing N+1 client requests by allowing inline expansion of related resources:

GET /orders/100?expand=customer,line_items
→ {
    "id": "ord_100",
    "customer": { ... },
    "line_items": [ ... ]
  }

Whitelist what can be expanded; cap nesting depth.

Putting It Together

GET /orders?
  status=open&
  created_at[gte]=2025-01-01&
  sort=-created_at&
  limit=50&
  cursor=...&
  fields=id,status,amount&
  expand=customer

→ {
    "data": [ ... ],
    "next_cursor": "...",
    "has_more": true
  }

Five small conventions and your list endpoints scale from prototype to billion-row table without breaking changes.

Cert Mapping

Cert	Pagination scope
AWS SAA	API Gateway pagination; DynamoDB pagination tokens
AWS Data Engineer	Pagination in batch ingestion APIs

The next lesson moves to GraphQL, which approaches these problems differently.