Pagination Patterns: Fetching Large Datasets Without Killing Your Database


Your API returns a list of orders. You add ?page=1&limit=20. It works. Users navigate to page 500. The query takes 30 seconds. Your database is doing OFFSET 9980 LIMIT 20 - scanning and discarding 9,980 rows to return 20. At page 1000, it scans 19,980 rows. The deeper the page, the slower the query.

This is the offset pagination problem. And it is why most high-performance APIs use cursor-based pagination instead.

Offset pagination: simple but broken at scale

Offset pagination uses LIMIT and OFFSET to skip to a page:

SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 9980;

How it works: Skip the first 9,980 rows, return the next 20.

The problem: The database must scan and discard all 9,980 rows before returning the 20 you want. Even with an index, this is O(offset) work. Page 1 is fast. Page 500 is slow. Page 10,000 is very slow.

The second problem: If rows are inserted or deleted between page requests, pages shift. A row inserted on page 1 pushes everything down. When the user navigates to page 2, they see a row they already saw on page 1 (or miss a row entirely).

When offset pagination is acceptable:

  • Small datasets (under 10,000 rows)
  • Admin interfaces where users rarely go past page 10
  • When you need random access to any page number (jump to page 47)
graph LR
subgraph offset["Offset Pagination - O(offset) cost"]
  Q1["Page 1
OFFSET 0
Scan 20 rows
Fast"]
  Q2["Page 50
OFFSET 980
Scan 1000 rows
Slower"]
  Q3["Page 500
OFFSET 9980
Scan 10000 rows
Slow"]
  Q4["Page 5000
OFFSET 99980
Scan 100000 rows
Very slow"]
  Q1 --> Q2 --> Q3 --> Q4
end

style Q1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style Q2 fill:#FAEEDA,stroke:#854F0B,color:#633806
style Q3 fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
style Q4 fill:#FCEBEB,stroke:#A32D2D,color:#791F1F

Cursor-based pagination: fast at any depth

Instead of “skip N rows,” cursor pagination uses a pointer to the last item seen. The next page starts after that item.

How it works:

  1. First request: GET /orders?limit=20
  2. Server returns 20 orders and a cursor: {"orders": [...], "next_cursor": "eyJpZCI6MTIzfQ=="}
  3. Next request: GET /orders?limit=20&cursor=eyJpZCI6MTIzfQ==
  4. Server decodes the cursor, queries from that point: WHERE id < 123 ORDER BY id DESC LIMIT 20

The cursor encodes the position in the result set. The query uses an indexed column to jump directly to that position. No scanning and discarding.

Performance: O(1) regardless of depth. Page 1 and page 10,000 take the same time.

The tradeoff: No random access. You cannot jump to page 500. You can only go forward (and sometimes backward). No “showing results 9,981-10,000 of 50,000.”

When cursor pagination is the right choice:

  • Infinite scroll feeds (Twitter, Instagram, Facebook)
  • Large datasets where deep pagination is needed
  • Real-time data where rows are frequently inserted/deleted

Cursor design

The cursor should be:

  • Opaque to the client - Base64-encode it so clients do not try to parse or construct cursors
  • Stable - The same cursor should always return the same next page
  • Efficient - The cursor should map to an indexed column

Common cursor implementations:

  • ID-based: cursor = base64({"id": 12345}). Query: WHERE id < 12345 ORDER BY id DESC
  • Timestamp + ID: cursor = base64({"created_at": "2024-01-15T10:30:00Z", "id": 12345}). Handles ties in timestamp.
  • Opaque token: Store the cursor state server-side (in Redis), return a token. More flexible but requires server-side storage.
graph TB
subgraph cursor["Cursor Pagination - O(1) cost"]
  P1["Page 1
GET /orders?limit=20
Returns: orders + cursor=abc"]
  P2["Page 2
GET /orders?cursor=abc
WHERE id < 12345
Direct index lookup"]
  P3["Page 100
GET /orders?cursor=xyz
WHERE id < 9876
Same cost as page 2"]
  P1 --> P2 --> P3
end

style P1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style P2 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style P3 fill:#E1F5EE,stroke:#0F6E56,color:#085041

Keyset pagination: the SQL implementation of cursor pagination

Keyset pagination is cursor pagination implemented directly in SQL using the WHERE clause instead of OFFSET.

-- First page
SELECT * FROM orders ORDER BY created_at DESC, id DESC LIMIT 20;

-- Next page (cursor = last row's created_at and id)
SELECT * FROM orders
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;

The (created_at, id) < (...) condition uses a composite index on (created_at, id) to jump directly to the right position. No scanning.

Why include id in the cursor? Timestamps are not unique. Multiple rows can have the same created_at. Including id as a tiebreaker ensures a stable, unique cursor.

Seek method for bidirectional pagination

For UIs that need both “next page” and “previous page”:

-- Next page
SELECT * FROM orders
WHERE (created_at, id) < (cursor_created_at, cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Previous page
SELECT * FROM orders
WHERE (created_at, id) > (cursor_created_at, cursor_id)
ORDER BY created_at ASC, id ASC
LIMIT 20;
-- Then reverse the results in application code

Where it breaks or gets interesting

Sorting by non-unique columns

If you sort by price and multiple products have the same price, the cursor is ambiguous. Always include a unique column (like id) as a tiebreaker in the sort and cursor.

Filtering with cursor pagination

Cursor pagination works well with filters that do not change between pages. But if the filter changes (user adds a filter after seeing page 1), the cursor from page 1 is invalid for the new filter. The client must start from page 1 with the new filter.

Total count with cursor pagination

Cursor pagination does not naturally provide a total count (“showing 1-20 of 50,000 results”). Counting all matching rows is expensive. Options:

  • Do not show total count (infinite scroll does not need it)
  • Show an approximate count (use EXPLAIN or table statistics)
  • Cache the count separately with a short TTL
  • Show “showing 1-20 of many results” without an exact number

Relay-style pagination (GraphQL)

GraphQL’s Relay specification defines a standard cursor pagination format:

{
  orders(first: 20, after: "cursor123") {
    edges {
      node { id, total }
      cursor
    }
    pageInfo {
      hasNextPage
      hasPreviousPage
      startCursor
      endCursor
    }
  }
}

This is the standard for GraphQL APIs and is supported by many GraphQL client libraries.

Real-world systems

Twitter - Cursor-based pagination for timelines. max_id parameter (show tweets with ID less than max_id). Prevents duplicate tweets when new tweets are inserted.

GitHub - Cursor-based pagination for REST API. Link header with next, prev, first, last URLs. GraphQL API uses Relay-style cursor pagination.

Stripe - Cursor-based pagination. starting_after and ending_before parameters (object IDs as cursors). Returns has_more boolean.

Elasticsearch - search_after parameter for deep pagination. Uses the sort values of the last result as the cursor. Much faster than from/size for deep pages.

Facebook - Cursor-based pagination for the Graph API. Opaque cursors that encode position in the result set.

How to apply it in practice

Choosing a pagination strategy

Use caseStrategyWhy
Admin table with page numbersOffsetNeed random page access
Infinite scroll feedCursorFast, handles inserts
Large dataset APIKeysetO(1) at any depth
GraphQL APIRelay cursorStandard, library support
Search resultsOffset (limited depth)Need page numbers, limit depth

API design for cursor pagination

GET /orders?limit=20
Response:
{
  "data": [...],
  "pagination": {
    "next_cursor": "eyJpZCI6MTIzfQ==",
    "has_more": true
  }
}

GET /orders?limit=20&cursor=eyJpZCI6MTIzfQ==

Always include has_more so clients know when to stop. Return null for next_cursor on the last page.

Limiting offset pagination depth

If you must use offset pagination, limit the maximum offset:

GET /orders?page=10001 -> 400 Bad Request
"Maximum page is 1000. Use cursor pagination for deeper results."

This prevents the slow deep-page queries while still supporting the common case.

FAQ

Q: Can you implement cursor pagination with a non-sequential ID?

Yes. UUIDs are not sequential, but you can use a timestamp + UUID cursor. Sort by (created_at DESC, id DESC) and use both values in the cursor. The created_at provides the ordering, the id provides uniqueness for tiebreaking. Alternatively, use a sequential ID (auto-increment or ULID) as the primary sort key.

Q: How do you handle cursor pagination when rows can be deleted?

If a row is deleted and its ID was used as a cursor, the next page query still works correctly - it just returns rows with IDs less than the deleted row’s ID. The deleted row is simply not in the results. This is one of the advantages of cursor pagination over offset pagination: deletions do not cause rows to be skipped or duplicated.

Q: What is the difference between cursor pagination and keyset pagination?

They are the same concept with different names. Cursor pagination is the API design pattern (opaque cursor returned to the client). Keyset pagination is the SQL implementation (using WHERE clause with the last row’s key values). A cursor-based API is typically implemented using keyset pagination in the database layer.

Interview questions

Q1: Your API has an endpoint that returns a list of 10 million products. Users can filter by category and sort by price. Design the pagination.

Strong answer: Use keyset pagination. The cursor encodes the last product’s price and ID (for tiebreaking). Query: WHERE (price, id) > (cursor_price, cursor_id) AND category = 'electronics' ORDER BY price ASC, id ASC LIMIT 20. Create a composite index on (category, price, id) to support this query efficiently. The index allows the database to jump directly to the right position without scanning. For the API: return an opaque cursor (base64-encoded price + id). Do not expose the cursor internals to clients. Return has_more: true/false. Do not return a total count (counting 10 million rows with a filter is expensive). If the UI needs a count, use an approximate count from table statistics or cache it separately.

Q2: You are building an infinite scroll feed for a social network. New posts are constantly being added. How do you handle pagination so users do not see duplicate posts?

Strong answer: Use cursor-based pagination with a stable cursor. The cursor is the ID of the last post seen. The query is WHERE id < cursor_id ORDER BY id DESC LIMIT 20. New posts have higher IDs, so they appear before the cursor and are not included in subsequent pages. The user’s feed is a consistent snapshot from the moment they started scrolling. If the user wants to see new posts, they refresh (which resets the cursor). This is how Twitter’s timeline works with max_id. The alternative - offset pagination - would cause duplicates: if 5 new posts are added while the user is on page 1, page 2 would include the last 5 posts from page 1 again.

Q3: Explain why OFFSET 100000 LIMIT 20 is slow and how keyset pagination fixes it.

Strong answer: OFFSET 100000 tells the database to skip the first 100,000 rows. Even with an index, the database must traverse 100,000 index entries to find the starting position. It cannot jump directly to row 100,001 because it does not know where that is without counting. This is O(offset) work. Keyset pagination replaces the offset with a WHERE condition: WHERE id < 12345 ORDER BY id DESC LIMIT 20. The database uses the index on id to jump directly to the row with id = 12345 in O(log N) time, then reads the next 20 rows sequentially. The cost is O(log N + page_size), which is constant regardless of how deep in the result set you are. The key requirement: the cursor column must be indexed and the query must use that index.