Event-Driven Architecture

Schema and Contracts

Ravinder·September 22, 2025·6 min read

Event-Driven ArchitectureDistributed SystemsArchitectureAvroProtobufSchema Registry

Series

Event-Driven Architecture

Part 4 of 10

← Part 3

Topology Patterns

Part 5 →

Outbox and Inbox

JSON is the default event format for most teams because it's readable and tooling is everywhere. It's also an implicit contract that no one enforces. When the order service adds a required field to its payload, every consumer silently breaks—or silently ignores the new field and produces wrong outputs. Schema drift is one of the most common causes of production incidents in event-driven systems, and it's entirely preventable.

Formal schemas with a registry are not bureaucracy. They are the guard rail that lets teams evolve their events independently without coordination overhead.

Why Schema Enforcement Matters

Without schemas, each consumer is an island of defensive code:

# Without schema enforcement — every consumer writes this
def handle_order_placed(event: dict):
    order_id = event.get("orderId") or event.get("order_id")  # which convention?
    amount = event.get("totalAmount", 0)  # default or crash?
    if not order_id:
        logger.warning("Missing orderId, skipping")
        return

With a schema, the deserialization either succeeds or fails at the boundary. You never process a malformed event silently.

Avro vs Protobuf

Both are binary serialization formats with schemas. They have different strengths.

Apache Avro:

Schema is embedded in the serialized data (or referenced via schema ID)
Natively supported by Confluent Schema Registry
Schema evolution rules are explicit: fields can be added or removed if defaults are provided
JSON-like schema definition language

{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.example.orders",
  "fields": [
    { "name": "orderId", "type": "string" },
    { "name": "customerId", "type": "string" },
    { "name": "totalAmount", "type": "double" },
    { "name": "currency", "type": "string", "default": "USD" },
    {
      "name": "items",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "OrderItem",
          "fields": [
            { "name": "sku", "type": "string" },
            { "name": "quantity", "type": "int" },
            { "name": "unitPrice", "type": "double" }
          ]
        }
      }
    },
    { "name": "occurredAt", "type": "long", "logicalType": "timestamp-millis" }
  ]
}

Protocol Buffers (Protobuf):

Language-neutral binary format from Google
Strong tooling across Java, Go, Python, Rust, C++
Field numbers are the stable contract—field names can change
Optional fields are inherently backward compatible

syntax = "proto3";
 
package com.example.orders;
 
message OrderPlaced {
  string order_id = 1;
  string customer_id = 2;
  double total_amount = 3;
  string currency = 4;
  repeated OrderItem items = 5;
  int64 occurred_at_ms = 6;
}
 
message OrderItem {
  string sku = 1;
  int32 quantity = 2;
  double unit_price = 3;
}

Choosing between them:

	Avro	Protobuf
Schema in payload	Yes (or registry ID)	No
Kafka ecosystem fit	Native	Good via plugin
Cross-language support	Good	Excellent
Human-readable schema	JSON	.proto IDL
Field evolution	Requires defaults	Field numbers
gRPC compatible	No	Yes

If you are Kafka-first with Confluent Schema Registry, Avro is the path of least resistance. If your system uses both Kafka and gRPC services, Protobuf unifies the format across transports.

Schema Registry

A schema registry is a central store for event schemas, versioned and queryable. Confluent Schema Registry is the most common, but AWS Glue Schema Registry and Apicurio are valid alternatives.

sequenceDiagram participant P as Producer participant R as Schema Registry participant B as Kafka Broker participant C as Consumer P->>R: Register schema for OrderPlaced v1 R-->>P: schema_id=42 P->>B: Publish [magic_byte][schema_id=42][avro_bytes] C->>B: Consume message C->>R: Fetch schema for id=42 R-->>C: Schema definition C->>C: Deserialize with schema

The wire format for Confluent-compatible Avro is:

[0x00][4-byte schema ID big-endian][avro payload bytes]

Consumers use the schema ID to fetch the writer schema from the registry and deserialize against it using their reader schema. This is schema evolution in action—the reader schema can differ from the writer schema as long as compatibility rules are satisfied.

Compatibility Modes

This is where teams make or break their event contracts. Schema registries enforce compatibility at registration time.

Backward compatible (default): new schema can read data written by old schema. Safe to deploy new consumers before updating producers.

// v1
{ "name": "currency", "type": "string" }
 
// v2 — backward compatible (adds a field with default)
{ "name": "currency", "type": "string" }
{ "name": "promoCode", "type": ["null", "string"], "default": null }

Forward compatible: old schema can read data written by new schema. Safe to deploy new producers before updating consumers.

Full compatible: both backward and forward. Safest but most restrictive.

Breaking changes that fail compatibility checks:

Removing a field without a default (backward incompatible)
Changing a field type (e.g., int to string)
Renaming a field in Avro (field names are part of the contract)
Reusing a field number for a different type in Protobuf

Versioning Strategy

There are two schools of thought: compatibility-based evolution and explicit versioning.

Compatibility-based evolution: every schema change must pass the configured compatibility check. You never break consumers because the registry rejects breaking changes at publish time.

# Attempt to register a breaking change
curl -X POST http://registry:8081/subjects/orders.placed-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"schema": "{...breaking schema...}"}'
# Returns 409 Conflict if incompatible

Explicit versioning with new topic: when a breaking change is unavoidable, publish to a new topic (orders.placed.v2) and run both consumers in parallel during migration. Producers eventually deprecate the v1 topic.

graph LR P[Order Service] -->|v1 events| T1[orders.placed.v1] P -->|v2 events| T2[orders.placed.v2] T1 --> C1[Legacy Consumer] T2 --> C2[New Consumer] Note1["v1 deprecated, consumer migrating"]

Explicit versioning is operationally noisier (two topics, two consumers) but gives you a clean cut-over with no flag day coordination.

Consumer-Driven Contract Testing

Publish schema tests as part of your CI pipeline. Producers run contract tests to verify they don't break registered consumer expectations.

# Using pact or custom contract tests
def test_order_placed_schema_compatibility():
    registry = SchemaRegistryClient({"url": "http://registry:8081"})
    
    new_schema = load_schema("schemas/order_placed_v2.avsc")
    
    result = registry.test_compatibility(
        subject="orders.placed-value",
        schema=new_schema
    )
    
    assert result["is_compatible"], (
        f"Schema change breaks compatibility: {result}"
    )

Key Takeaways

JSON without schema enforcement is an implicit contract that breaks silently—use Avro or Protobuf for any event with more than one consumer.
A schema registry enforces compatibility at registration time, preventing breaking changes from reaching production consumers.
Avro fits Kafka-native ecosystems; Protobuf fits polyglot systems that also use gRPC.
Backward compatibility is the minimum standard—add fields with defaults, never remove required fields.
When a breaking change is unavoidable, use a new topic with explicit versioning rather than a flag-day migration.
Automate compatibility checks in CI so schema breaks are caught before they reach the broker.

Series

Event-Driven Architecture

Part 4 of 10

← Part 3

Topology Patterns

Part 5 →

Outbox and Inbox