How do you evolve REST APIs backward-compatibly at scale?
answer
My strategy for backward-compatible evolution is contract-first and safety-first. I keep a stable default version, use additive changes, and guard risk with feature flags. I support schema negotiation (Accept headers, media types) and design tolerant readers that ignore unknown fields. Deprecations follow a staged policy with telemetry, warnings, and sunset headers. Rollouts are canary or shadow, with instant rollback via flags or router rules. Every change ships with contract tests and error-budget-aware monitors.
Long Answer
Backward-compatible evolution in RESTful APIs is about protecting clients while the platform changes underneath. The simplest rule is “do not break consumers,” but the workable rule is “make change observable, optional, and reversible.” I combine contract-first design, disciplined versioning, schema negotiation, tolerant readers, and staged rollout/rollback governed by telemetry.
1) Contract-first and compatibility principles
All changes begin in an API contract (OpenAPI) with clear compatibility rules. Backward-compatible changes are additive: new optional fields, new resources, new query parameters, and new error codes that clients can ignore. I avoid removing fields, changing types, or repurposing semantics in a minor line. When a breaking change is unavoidable, I introduce a new version alongside the old and run them in parallel until migration completes.
2) Versioning strategy that fits the surface
I prefer media type versioning or header-based negotiation (Accept: application/vnd.example.order+json;v=2) because it decouples version from URL and supports schema negotiation per client. However, if the ecosystem expects it, URL versioning (/v1) is acceptable for clarity. The default experience remains on the stable major version, while preview features use explicit opt-in media types or feature flags. Internally I tag versions by major.minor; majors can break, minors cannot.
3) Schema negotiation and evolvable payloads
Clients declare acceptable representations via Accept and optionally Prefer headers. Servers return the best match and advertise new capabilities with Link relations and profile or schema URLs. Responses may include extension fields namespaced under x_ or vendor prefixes; clients act as tolerant readers, ignoring unknown keys and defaulting missing optional fields. Enums are future-proofed by allowing unknown values to fall back to “other,” avoiding hard failures during expansion.
4) Tolerant readers and defensive clients
I evangelize consumer patterns:
- Ignore unknown properties, treat absent optional fields as defaults.
- Accept additional enum values without crashing.
- Use ETags and conditional requests, not brittle timestamps.
- Rely on HATEOAS-style links for navigation where helpful, not hardcoded paths.
Server patterns complement this: do not suddenly require new mandatory fields; keep error structures stable and include machine-parsable codes.
5) Deprecation policy and visibility
Breaking change pressure is real, so a transparent deprecation playbook is essential. I publish a policy with minimum support windows (for example, eighteen months), announce via changelog and email, and return Sunset and Deprecation headers with Link: rel="sunset" to docs. Telemetry identifies active clients by version and endpoint. We contact high-value consumers directly, provide migration guides and test sandboxes, and track progress on a deprecation scoreboard. Removal happens only after targets are met and error budgets remain healthy.
6) Feature flags and safe rollout
Every risky change is guarded by a feature flag at the edge or handler level. I roll out via canary (small percentage), traffic shadowing (duplicate requests to new code but ignore responses), or dual-write/dual-read when storage models change. For read path changes, I use sticky cohorts so a client stays on one behavior during a session. Flags support instant rollback without redeploy. For state changes, I combine flags with idempotency keys to avoid duplicate effects during retries.
7) Testing and contract governance
I enforce consumer-driven contracts: downstream teams publish expectations that run in our pipeline (Pact, Spring Cloud Contract). We gate merges on compatibility checks that compare OpenAPI (for example, oasdiff) and reject breaking changes on stable versions. We mirror production traffic in pre-prod to surface schema mismatches. Each release ships with golden test fixtures for critical clients, keeping serialized payloads as snapshots to detect accidental changes.
8) Observability and abort criteria
I monitor error rates, status code shifts, latency, payload sizes, and downstream timeouts per version and per flag cohort. Thresholds tied to error budgets trigger automatic rollback or flag disable. Structured logs include request IDs, client app IDs, version, and feature flags so we can pinpoint regressions instantly. Dashboards show adoption of new versions and migration burndown.
9) Data model and storage evolution
For persistent changes, I use expand-and-contract:
- Add new columns or tables and start dual-writing behind a flag.
- Backfill data.
- Cut reads to the new schema.
- Stop old writes, then remove legacy fields.
All steps are reversible until the final contract step. Migrations are chunked, idempotent, and tested on shadow traffic.
10) Rollback plans you can actually execute
Rollback is not a hope; it is a path. I keep old binaries and migration scripts ready, guard schema changes with toggles, and maintain read compatibility so older services can parse newer records when possible. If a new major version misbehaves, the router or gateway rewrites incoming requests to the previous version based on client headers. For data, dual-write ensures new data is not lost during rollback.
The result is a backward-compatible evolution engine: contracts make changes explicit, schema negotiation and tolerant readers keep clients stable, feature flags and staged rollouts reduce blast radius, and clear rollback paths make change safe.
Table
Common Mistakes
- Treating versioning as an afterthought and shipping breaking changes under the same version.
- Using only URL versions and lacking schema negotiation, forcing global client upgrades.
- Removing or renaming fields without a deprecation window or telemetry to see who is affected.
- Relying on retries instead of feature flags, canaries, and shadow tests to de-risk changes.
- Building brittle clients that crash on unknown enum values or extra properties.
- Changing storage schemas in place without expand-and-contract or dual-writes.
- No rollback plan; roll forward becomes the only option during incidents.
- Missing contract tests and OpenAPI diff gates, letting accidental breaks reach production.
Sample Answers
Junior:
“I keep changes additive and version the API when I need to break contracts. Clients can request a version with an Accept header. I write tolerant readers that ignore unknown fields and I use feature flags to roll out changes gradually. Deprecations include a clear timeline.”
Mid:
“I prefer header or media type schema negotiation so clients opt in per call. I add Sunset headers and track usage to know who still relies on old fields. Rollouts are canary or shadow, guarded by flags, and I use consumer-driven contracts plus OpenAPI diffs in CI to block breaking changes.”
Senior:
“I run a full evolution program: stable major lines, additive minors, and preview media types. Clients are tolerant readers; servers never introduce new mandatory fields in place. Risky changes dual-write storage with expand-and-contract, roll out via flags and sticky cohorts, and rollback instantly via router rules. Telemetry, error budgets, and contract tests govern every step.”
Evaluation Criteria
Look for a complete approach: contract-first design, clear versioning (header/media type or URL), schema negotiation, and tolerant readers on the client. Strong answers include a documented deprecation policy with Sunset headers and telemetry, feature-flagged canary or shadow rollouts, dual-write expand-and-contract for storage, and concrete rollback paths. They mention consumer-driven contracts and OpenAPI diff gates in CI, plus per-version observability and error-budget triggers. Red flags: “just bump the version” without migration support, removing fields abruptly, no flags, or no rollback beyond redeploy.
Preparation Tips
- Write an OpenAPI for a small service; practice additive changes and compare with oasdiff.
- Implement header-based schema negotiation with vendor media types and a default version.
- Build a tolerant client that ignores unknown fields and accepts unknown enum values.
- Add Sunset and Deprecation headers; create a migration guide and a sample outreach email.
- Guard an endpoint change with a feature flag; roll out to one percent, then canary by tenant.
- Set up shadow traffic for a new implementation; compare responses and latency.
- Practice expand-and-contract with dual-write and backfill; rehearse rollback.
- Add contract tests (Pact) and OpenAPI diff checks to CI; fail builds on breaking changes.
Real-world Context
A fintech introduced media-type schema negotiation and tolerant readers before a KYC overhaul. They shipped new fields as optional, canaried with feature flags, and monitored per-version error rates; migrations finished without client outages. An e-commerce platform moved to a new pricing model using expand-and-contract with dual-writes and backfill; rollouts were shadowed for a week, then switched over with instant rollback available. A SaaS vendor adopted consumer-driven contracts and OpenAPI diff gates, cutting accidental breaking changes to near zero; Sunset headers plus telemetry made deprecations predictable and drama-free.
Key Takeaways
- Prefer additive changes; use explicit versioning and schema negotiation.
- Build tolerant readers so clients survive expansion.
- Run feature-flagged canary or shadow rollouts with instant rollback.
- Deprecate with policy, telemetry, and Sunset headers.
- Use expand-and-contract and dual-writes for storage evolution.
- Enforce compatibility with contract tests and OpenAPI diff gates.
Practice Exercise
Scenario:
You own a public RESTful API used by hundreds of clients. You must add new fields to Orders, rename a legacy field, and move pricing to a new storage model, all without breaking clients.
Tasks:
- Model the change in OpenAPI: add optional fields, keep the legacy field for now, and introduce application/vnd.example.order+json;v=2 as an opt-in media type.
- Implement schema negotiation based on Accept and default to v1. Add Link: rel="profile" to the v2 schema.
- Update the client SDK example to act as a tolerant reader: ignore unknown properties, allow unknown enum values, and default missing optional fields.
- Add Sunset and Deprecation headers on the legacy field; create docs and a migration guide.
- Guard the new implementation with a feature flag. Roll out via canary to one percent and also shadow ten percent of traffic to compare responses and latency.
- Migrate storage with expand-and-contract: create new columns, enable dual-write, backfill, switch reads, then remove legacy writes.
- Add consumer-driven contract tests for two key partners and an OpenAPI diff gate in CI to block breaking changes.
- Define rollback: router rule to force v1 media type, disable the flag, and continue to read old storage while dual-write remains on.
- Instrument per-version SLIs and error-budget thresholds that automatically disable the flag if error rate or latency spikes.
Deliverable:
A change plan and code diffs demonstrating backward-compatible evolution with versioning, schema negotiation, tolerant readers, and a tested rollout/rollback path that keeps clients stable during and after the transition.

