How do you design caching and pagination for high-traffic APIs?
answer
I design caching and pagination around stable keys and validator-driven freshness. For lists I prefer cursor pagination for correctness at scale; I reserve offset pagination for small, static sets. I implement RFC 7232 conditional requests with ETag and Last-Modified so clients and CDNs use 304 responses aggressively. I shape keys by filter parameters, vary on auth, and adopt explicit CDN cache invalidation (event-driven purge, time-to-live plus revalidation). Consistency comes from idempotent cursors and write-through or fan-out invalidation.
Long Answer
High-traffic collections live and die by caching and pagination. The goal is to reduce origin load and tail latency while preserving correctness as data changes under read pressure. I combine robust pagination semantics, RFC 7232 conditional requests, and deliberate CDN cache invalidation that aligns with business rules.
1) Choose the right pagination model
Cursor pagination (a.k.a. keyset pagination) scales better: each page is defined by a stable cursor derived from a unique, indexed ordering (for example (updated_at, id)). It avoids gaps and duplicates during concurrent writes and performs well with large offsets by using range scans. I include next and prev cursors and document immutability within a short time window.
Offset pagination is simple but expensive at scale due to OFFSET N scans and shifting rows. I use it only for small, slow-moving datasets or when business requirements demand absolute position semantics. When forced, I cap limit, index sort columns, and expose a maximum offset.
2) Canonical sorting and deterministic cursors
To prevent duplicate or missing items, I require a total ordering (for example ORDER BY updated_at DESC, id DESC). Cursors encode the last item’s sort keys, signed to prevent tampering. I design cursors to be idempotent: the same cursor plus filters always returns the same boundary behavior until the next write that crosses that boundary. For backfills I allow reverse traversal with a before cursor.
3) Shape cache keys deliberately
Cache keys include path, normalized query parameters (filters, sort, limit), and an authorization dimension. I avoid user-unique fragments unless the response truly depends on the user; otherwise I prefer public responses with Vary headers (for example Vary: Authorization, Accept-Encoding). Normalizing parameter order and default values ensures maximal cache reuse.
4) Conditional requests per RFC 7232
For collection and item endpoints I implement ETag (content hash or version token) and Last-Modified (from updated_at). Clients send If-None-Match or If-Modified-Since, and the server returns 304 Not Modified when appropriate. For range-changing queries (pagination, filters), I compute ETags per unique key, not globally. Weak ETags are acceptable for lossy transformations; strong ETags are ideal when byte-for-byte equality matters. I document precedence: If-None-Match is evaluated before If-Modified-Since.
5) Freshness, staleness, and revalidation
Responses include Cache-Control with a short max-age and a longer stale-while-revalidate for CDNs that support RFC 5861 semantics. Items with frequent updates can be served stale-while-revalidate to keep tail latency low while CDNs refresh asynchronously. Private data uses Cache-Control: private, must-revalidate and user-tied validators. I expose Age headers to help clients tune aggressive revalidation.
6) CDN cache invalidation strategies
I use two patterns:
- Event-driven invalidation: on writes, publish domain events that map to keys or surrogate keys (for example Fastly Surrogate-Key headers). The CDN purges those keys or tags, keeping purge scope tight.
- Time-to-live plus revalidation: for broad lists where precise invalidation is costly, I set conservative TTLs and rely on ETag/Last-Modified validation to cheap 304s.
When collections are composed (for example “latest” endpoint), I tag responses with all affected surrogate keys so a single item change purges both item and list pages.
7) Consistency with writes
For user-facing operations I prefer write-through caches for items and fan-out invalidation for lists: mutate storage, update or purge item cache, then purge list pages tagged with that item’s surrogate key. Where strict freshness is critical, I add read-your-write guarantees by attaching Cache-Control: no-store to the immediate follow-up response or by routing the user to a non-cached variant for a short window.
8) Dealing with personalization and auth
If personalization is heavy, I push caching to edges that support authenticated caching with stale-while-revalidate and Vary: Authorization. Otherwise I separate public and private fields: return a public, cacheable envelope and fetch private deltas via a light, non-cacheable endpoint. This pattern preserves CDN cache invalidation simplicity and origin offload.
9) Testing and observability
I ship contract tests for cursor vs offset pagination to catch duplicates or gaps under simulated writes. I verify RFC 7232 conditional requests by comparing ETag and 304 rates in staging. I export metrics: cache hit ratio by key group, 304 ratio, origin requests per page, and p95 latency per page depth. I log rendered surrogate keys to trace purge impact. Synthetic checks ensure purges reflect within a target service level objective.
10) Failure and safety levers
I cap limit, reject pathological filter combinations, and protect the origin with circuit breakers. If an unexpected surge collapses hit ratio, I can temporarily increase TTLs or expand stale serving windows at the CDN. I keep a manual purge tool and a dry-run mode that lists candidate keys before deletion.
By standardizing cursor pagination, adopting RFC 7232 conditional requests with ETag/Last-Modified, and using targeted CDN cache invalidation, high-traffic collections remain fast, correct, and economical.
Table
Common Mistakes
- Using offset pagination for huge, hot collections, causing slow scans and page drift during writes.
- Missing a total order, which yields duplicates or gaps between pages.
- Hashing ETags from raw bytes while also varying response by user without Vary, leading to validator confusion.
- Treating ETag/Last-Modified as decoration instead of honoring RFC 7232 conditional requests, wasting 304 opportunities.
- Purging entire paths on every write instead of using CDN cache invalidation with surrogate keys.
- Caching personalized responses publicly, leaking data, or over-scoping Vary so the cache fragments.
- Unbounded limit parameters that let clients request thousands of items per page.
- No observability for hit ratio, 304 rate, and purge effectiveness, making regressions invisible.
Sample Answers
Junior:
“I prefer cursor pagination for large collections and keep offset pagination for small lists. I return ETag and Last-Modified and handle If-None-Match so clients get 304s. I set short max-age and rely on revalidation.”
Mid-level:
“I sort by (updated_at, id) and sign cursors for determinism. Cache keys include filters and a Vary header for auth. I implement RFC 7232 conditional requests and measure 304 rates. For CDN cache invalidation, I tag responses with surrogate keys and purge only affected pages.”
Senior:
“I design keyset cursor pagination with composite indexes and idempotent cursors, enforce caps, and ship contract tests for duplicates and gaps. I compute per-key ETag and Last-Modified, prioritize If-None-Match, and use stale-while-revalidate. I combine write-through updates with event-driven CDN cache invalidation using surrogate keys, plus observability for hit ratio, purge latency, and tail latency budgets.”
Evaluation Criteria
A strong answer selects cursor pagination for scale, reserves offset pagination for small or static sets, and enforces a total order. It implements RFC 7232 conditional requests with correct ETag/Last-Modified behavior and returns 304 when appropriate. It explains cache key construction, Vary usage, and CDN cache invalidation via surrogate keys or targeted purge. It addresses personalization, limits, and observability (hit ratio, 304 rate, purge effectiveness). Red flags include defaulting to offset, no validators, path-wide purges, missing Vary, and unlimited page sizes.
Preparation Tips
- Build a demo API with cursor vs offset pagination; measure p95 latency at high offsets and under concurrent writes.
- Add ETag and Last-Modified; verify 304 behavior with If-None-Match and If-Modified-Since.
- Normalize cache keys and add Vary for authorization; test cache reuse by shuffling query parameter order.
- Implement surrogate keys or tags and write a purge function; confirm that a single item update invalidates item and “latest” list pages only.
- Tune Cache-Control: max-age and stale-while-revalidate; observe hit ratio and tail latency.
- Add guardrails: limit caps, sort enforcement, and circuit breakers.
- Create dashboards for 304 rate, hit ratio, purge latency, and errors during pagination traversal.
Real-world Context
- Marketplace feed: Switching from offset pagination to cursor pagination reduced p95 list latency by 60 percent and eliminated duplicate items during flash sales.
- News API: Adding ETag/Last-Modified raised 304 rates above 70 percent, cutting origin egress significantly while keeping freshness via short TTL plus revalidation.
- SaaS analytics: Surrogate-key tagging allowed precise CDN cache invalidation; item edits purged only impacted dashboards and the first two list pages, reducing collateral cache misses.
- Social app: Public envelope cached at the CDN with private delta fetched separately maintained performance without exposing personalized data; hit ratio increased and tail latency dropped on Slow 4G profiles.
Key Takeaways
- Prefer cursor pagination with a total order; reserve offset pagination for small sets.
- Implement RFC 7232 conditional requests with correct ETag/Last-Modified handling.
- Normalize cache keys and use Vary judiciously to maximize reuse.
- Use surrogate keys for precise CDN cache invalidation; avoid path-wide purges.
- Monitor hit ratio, 304 rate, purge latency, and tail latency to prevent regressions.
Practice Exercise
Scenario:
You own a high-traffic /articles API with filters (topic, author, since), sort by updated_at DESC, and peaks during breaking news. Clients include web, mobile, and partners behind several CDNs. You must improve caching and pagination while preserving correctness under constant writes.
Tasks:
- Replace offset pagination with cursor pagination keyed by (updated_at, id). Implement next and prev cursors, signed to prevent tampering. Enforce a limit cap of 100.
- Add RFC 7232 conditional requests: compute a per-key ETag (hash of ids plus top-level metadata) and Last-Modified from the first item on the page. Prefer If-None-Match when both validators are present.
- Normalize cache keys: path + sorted query parameters + limit; add Vary: Authorization, Accept-Encoding.
- Configure Cache-Control: max-age=30, stale-while-revalidate=300 for public lists; use private, must-revalidate for user-scoped variants.
- Implement CDN cache invalidation with surrogate keys: tag item responses with article:{id} and list responses with all included article:{id} plus list:articles. On update, purge the item key and any lists that reference it.
- Create dashboards for hit ratio, 304 rate, purge latency, and p95 latency by page depth.
- Write integration tests: concurrency test that inserts items between requests to ensure no duplicates or gaps; validator test that confirms 304 responses; purge test that validates list refresh after an item change.
Deliverable: A working plan and reference implementation that demonstrates scalable caching and pagination with cursor vs offset pagination, correct RFC 7232 behavior, robust ETag/Last-Modified validators, and precise CDN cache invalidation.

