Skip to main content
Architecture

API Design for Long-Lived Systems

Ravinder··9 min read
ArchitectureAPI DesignVersioningOpenAPIBackend
Share:
API Design for Long-Lived Systems

The API You Ship Today Is a Promise

Every public API endpoint you ship is a commitment. The moment a consumer integrates against it, you owe them stability. Break that commitment carelessly and you break running software in production — not a test environment, not a demo. Real users, real transactions, real fallout.

Most teams do not think about API longevity at design time. They think about getting the feature shipped. This is understandable. It is also how you end up in the situation I walked into at one company: 47 active API versions, a changelog nobody maintained, and a team paralysed before every release because nobody knew which clients still depended on which endpoints.

This post is the API design playbook I now apply from day one. It covers contract-first design, versioning strategies, backward compatibility rules, header-based deprecation signalling, and consumer-driven contract testing.


Start With the Contract, Not the Code

The most important habit in API design is to write the contract before you write the implementation. Not after. Before.

A contract-first approach means:

  1. Write the OpenAPI (or Protobuf) spec
  2. Review it with consumer teams
  3. Generate stub servers from the spec
  4. Build the implementation against the contract
  5. Validate the implementation matches the contract in CI
# openapi: 3.1.0
openapi: "3.1.0"
info:
  title: Order Service API
  version: "2.0.0"
 
paths:
  /orders/{orderId}:
    get:
      operationId: getOrder
      summary: Retrieve a single order
      parameters:
        - name: orderId
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        "200":
          description: Order found
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Order"
        "404":
          description: Order not found
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/ProblemDetail"
 
components:
  schemas:
    Order:
      type: object
      required: [orderId, status, createdAt]
      properties:
        orderId:
          type: string
          format: uuid
        status:
          type: string
          enum: [PENDING, CONFIRMED, SHIPPED, DELIVERED, CANCELLED]
        createdAt:
          type: string
          format: date-time
        # New optional fields can be added without a version bump
        estimatedDelivery:
          type: string
          format: date-time
          description: "Added in v2.1 — optional, absent if unknown"

The spec is the source of truth. Not a wiki. Not a Notion doc. The machine-readable spec, checked into version control alongside the code.


Versioning Strategy

There is no universally correct versioning strategy. There are trade-offs. Here are the three practical choices.

flowchart TD Question["API versioning approach?"] --> A["URL versioning\n/v1/, /v2/"] Question --> B["Header versioning\nAPI-Version: 2024-01-01"] Question --> C["Content negotiation\nAccept: application/vnd.orders.v2+json"] A --> A1["✓ Visible and debuggable\n✓ Easy to route at proxy\n✗ URL pollution\n✗ Clients must change base URL"] B --> B1["✓ Clean URLs\n✓ Date-based versions are intuitive\n✗ Harder to test in browser\n✗ Not all clients set headers"] C --> C1["✓ Most REST-pure\n✗ Complex client code\n✗ Poor tooling support"] style A fill:#DBEAFE,stroke:#3B82F6 style B fill:#D1FAE5,stroke:#10B981 style C fill:#F3F4F6,stroke:#9CA3AF

My recommendation: URL versioning for public APIs, header versioning for internal APIs.

URL versioning (/v1/orders, /v2/orders) has one decisive advantage: it is impossible to miss. Junior developers, external partners, and support staff can all see what version they are calling. Routing at the proxy is trivial. Logs are unambiguous.

Header versioning (API-Version: 2024-11-01) is preferable for internal services where you control all clients and you want to avoid cluttering URL hierarchies. Stripe's date-based approach is the best implementation of this pattern.

Avoid content negotiation for anything that is not a true hypermedia API. The complexity cost is not worth it.


Backward Compatibility Rules

Backward compatibility is not a vague principle. It has concrete rules.

Safe changes — no version bump required

graph TD Safe["Safe (Additive)"] --> S1["Add optional request field"] Safe --> S2["Add optional response field"] Safe --> S3["Add new enum value\n(if consumer uses unknown-safe handling)"] Safe --> S4["Add new endpoint"] Safe --> S5["Relax validation\n(wider accepted inputs)"] Unsafe["Breaking Changes"] --> U1["Remove field"] Unsafe --> U2["Rename field"] Unsafe --> U3["Change field type"] Unsafe --> U4["Make optional field required"] Unsafe --> U5["Change enum value meaning"] Unsafe --> U6["Change URL structure"] Unsafe --> U7["Change error response shape"] style Safe fill:#D1FAE5,stroke:#10B981 style Unsafe fill:#FEE2E2,stroke:#EF4444

The rule of thumb is: consumers should be able to upgrade to a new API version by doing nothing. If they have to change code, it is a breaking change.

Handling enum evolution safely

One subtle breaking change teams miss is adding new enum values. If a consumer deserialises the response into a strongly-typed enum and receives an unknown value, it throws. The solution is to specify in your contract that consumers must handle unknown enum values gracefully:

// Consumer-side Java — deserialise with unknown-safe fallback
@JsonDeserialize(using = SafeEnumDeserializer.class)
public enum OrderStatus {
    PENDING, CONFIRMED, SHIPPED, DELIVERED, CANCELLED, UNKNOWN;
    
    public static OrderStatus fromValue(String value) {
        try {
            return valueOf(value);
        } catch (IllegalArgumentException e) {
            return UNKNOWN; // Never throw on unknown enum values
        }
    }
}

Document this requirement explicitly in your API contract. Consumers who do not implement it are opted-out of receiving new enum values.


Deprecation Without Drama

Deprecation is a process, not an event. The process has three stages: announce, warn, retire.

timeline title API v1 Deprecation Timeline section Announce Month 0 : Publish deprecation notice : Add Deprecation header to all v1 responses : Email all registered API consumers section Warn Month 3 : Add Sunset header with exact retirement date : Weekly usage reports to API team : Reach out to high-traffic consumers directly section Retire Month 9 : v1 returns 410 Gone with migration URL : 410 responses logged for 30 days Month 10 : Infrastructure decommissioned

Headers are your deprecation channel

The IETF has standardised headers for this:

HTTP/1.1 200 OK
Content-Type: application/json
Deprecation: Tue, 01 Apr 2025 00:00:00 GMT
Sunset: Thu, 01 Jan 2026 00:00:00 GMT
Link: <https://api.example.com/v2/orders>; rel="successor-version"

Deprecation tells clients when deprecation was announced. Sunset tells them when it will stop working. Link with rel="successor-version" points to the replacement. These headers are machine-readable — consumers can build automated alerts on them.

# Consumer SDK — auto-detect deprecation headers
class APIClient:
    def _check_deprecation(self, response: requests.Response) -> None:
        if "Deprecation" in response.headers:
            sunset = response.headers.get("Sunset", "unknown date")
            successor = response.headers.get("Link", "")
            warnings.warn(
                f"API endpoint deprecated, sunset: {sunset}. "
                f"Migrate to: {successor}",
                DeprecationWarning,
                stacklevel=3,
            )

Consumer-Driven Contract Testing

The most common way breaking changes reach production is through the gap between what the provider thinks the contract says and what consumers actually depend on. Consumer-driven contract testing closes that gap.

sequenceDiagram participant Consumer as Consumer Team participant Broker as Pact Broker participant Provider as Provider Team Consumer->>Consumer: Write Pact test\n(define expectations) Consumer->>Broker: Publish consumer contract Provider->>Broker: Pull consumer contracts Provider->>Provider: Run provider verification Provider-->>Broker: Publish verification result Broker-->>Consumer: Notify: can I deploy? note over Broker: "can-i-deploy" gate\nblocks breaking releases

Pact is the standard tool. Consumers write tests that express exactly what they need from the provider — specific fields, specific response shapes. The provider runs those tests as part of its CI pipeline. If the provider changes a field a consumer depends on, the consumer's Pact test fails in the provider's pipeline before it reaches main.

This flips the detection model: instead of discovering breaking changes in production, you discover them at PR time.

// Consumer Pact test — Spring Boot + Pact JVM
@ExtendWith(PactConsumerTestExt.class)
@PactTestFor(providerName = "order-service")
class OrderClientPactTest {
 
    @Pact(consumer = "checkout-service")
    RequestResponsePact getOrderPact(PactDslWithProvider builder) {
        return builder
            .given("order 123 exists")
            .uponReceiving("a request for order 123")
            .path("/v2/orders/123")
            .method("GET")
            .willRespondWith()
            .status(200)
            .body(new PactDslJsonBody()
                .stringType("orderId")
                .stringMatcher("status", "PENDING|CONFIRMED|SHIPPED|DELIVERED|CANCELLED")
                .datetime("createdAt", "yyyy-MM-dd'T'HH:mm:ss'Z'")
            )
            .toPact();
    }
}

API Gateway as the Enforcement Layer

Policy enforcement (authentication, rate limiting, deprecation headers, response transformation) belongs in the gateway — not in individual services.

flowchart LR Client --> GW["API Gateway"] GW -->|"Authn / Authz"| GW GW -->|"Rate limit"| GW GW -->|"Add Deprecation headers"| GW GW -->|"v1 route"| Monolith["Legacy Handler"] GW -->|"v2 route"| Service["New Service"] style GW fill:#DBEAFE,stroke:#3B82F6 style Monolith fill:#FEF3C7,stroke:#D97706 style Service fill:#D1FAE5,stroke:#10B981

This has a critical benefit for migration: when you move from v1 to v2, the gateway is the only place you change routing. Services have no awareness of version negotiation. Deprecation headers are injected by the gateway based on route configuration. No code changes in services when you announce a deprecation.


Practical API Longevity Checklist

Before you ship a new API, run through this checklist:

API Longevity Review
═══════════════════════════════════════════════
Schema & Contract
  ☐ OpenAPI spec written before implementation
  ☐ Spec reviewed with at least one consumer team
  ☐ Spec checked into version control
  ☐ CI validates implementation against spec
 
Versioning
  ☐ Versioning strategy documented in API guide
  ☐ Version visible in URL or header (not implicit)
  ☐ Changelog maintained for each version
 
Backward Compatibility
  ☐ All new fields are optional with defaults
  ☐ No required fields removed
  ☐ Enum consumers use unknown-safe handling
  ☐ Error response shape is stable
 
Deprecation
  ☐ Deprecation policy documented (min 6 months notice)
  ☐ Deprecation + Sunset headers configured in gateway
  ☐ Consumer communication plan exists
 
Testing
  ☐ Pact consumer contracts published
  ☐ Provider verification runs in CI
  ☐ "Can I deploy" gate enabled in deployment pipeline
═══════════════════════════════════════════════

The Real Measure of API Quality

It is not the elegance of the URL structure. It is not adherence to REST constraints. It is this: how long can a consumer stay on an old version before they are forced to migrate?

If your answer is "less than six months" you do not have an API — you have a dependency tax you collect from every team that integrates with you.

Long-lived APIs require discipline at design time. Contract-first. Additive-only changes. Explicit deprecation timelines. Consumer-driven tests. These are not optional features for mature teams. They are the baseline for not breaking production systems on Friday afternoons.

Design your APIs like you will be the on-call engineer when a consumer is paged at 2am because of a change you made. Because sometimes, you will be.