Schema Evolution
Series
Kafka in ProductionThe first time a team breaks schema compatibility in a production Kafka topic, they learn an expensive lesson: unlike a REST API where you can version the URL and run both versions simultaneously, a Kafka topic is a shared log. Every consumer reading that topic—present and future, including the one reading historical data for backfill—must handle every schema version that ever landed in that log.
Schema evolution is about making that manageable. Not magical, not automatic—manageable. It requires a registry, a compatibility discipline, and a clear understanding of what "backward compatible" actually means.
Why JSON Without a Registry Fails at Scale
JSON is the gateway drug. It's readable, flexible, and gets you to MVP quickly. The problem surfaces when the system matures:
- Field renamed by a producer team without notifying consumers — silent data loss.
- New required field added without a default — consumers on old code crash.
- Type changed from
inttostring— deserialization exception cascades.
Without a schema registry enforcing compatibility, you're relying on organizational discipline across every team that produces or consumes a topic. That discipline degrades over time, across teams, and across time zones.
Schema Registry in Practice
Confluent Schema Registry (and compatible alternatives like AWS Glue Schema Registry) stores versioned schemas and enforces compatibility rules at produce time. The flow:
The wire format with Avro + Schema Registry is compact: a 1-byte magic byte, a 4-byte schema ID, then the raw Avro binary. Compare this to JSON where field names are repeated in every message.
// Producer with Avro + Schema Registry
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://schema-registry:8081");
KafkaProducer<String, OrderEvent> producer = new KafkaProducer<>(props);
OrderEvent event = OrderEvent.newBuilder()
.setOrderId("ORD-001")
.setCustomerId("CUST-42")
.setTotalAmount(99.99)
.build();
producer.send(new ProducerRecord<>("order-events", event.getOrderId(), event));Compatibility Modes: Know What You're Enforcing
Schema Registry supports five compatibility modes. Most teams pick one and never revisit it—usually the wrong one.
not just the latest]
BACKWARD (default): New schema can read data written with the previous schema. This is consumer-centric. You can deploy new consumers before producers finish migrating.
FORWARD: Old schema can read data written with the new schema. This is producer-centric. You can deploy new producers before consumers migrate.
FULL: Both directions. Most restrictive. Required for long-lived topics where you can't control deployment order.
FULL_TRANSITIVE: Like FULL but checks against every historical version, not just the previous one. Use this when consumers replay from the beginning of the topic (backfill jobs, audit systems).
# Set compatibility for a subject
curl -X PUT http://schema-registry:8081/config/order-events-value \
-H "Content-Type: application/json" \
-d '{"compatibility": "FULL_TRANSITIVE"}'Safe Changes vs. Breaking Changes
Avro's rules (Protobuf and JSON Schema have similar but distinct rules):
Safe (BACKWARD compatible):
- Add an optional field with a default value.
- Remove a field that had a default.
- Promote a type:
int→long,float→double.
{
"type": "record",
"name": "OrderEvent",
"fields": [
{"name": "order_id", "type": "string"},
{"name": "customer_id", "type": "string"},
{"name": "total_amount", "type": "double"},
{"name": "currency", "type": "string", "default": "USD"}
]
}Adding currency with "default": "USD" is backward compatible. Old data without currency will read as "USD".
Breaking (NOT safe):
- Add a field without a default.
- Remove a field without a default.
- Rename a field (Avro treats this as remove + add).
- Change a type incompatibly:
string→int.
# Python: validate locally before registering
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
sr_client = SchemaRegistryClient({"url": "http://schema-registry:8081"})
# Test compatibility before registering
result = sr_client.test_compatibility(
subject_name="order-events-value",
schema=new_schema
)
if not result:
raise ValueError("Schema is not compatible — deployment blocked")Handling Renames and True Breaking Changes
When you genuinely need a breaking change (a rename, a structural redesign), you have two options:
Option 1: New topic, migrate consumers
order-events → old schema, old producers/consumers
order-events-v2 → new schema, new producers/consumersRun dual-write (producer writes to both) during the migration window. Once all consumers are on v2, retire the old topic. This is clean but operationally expensive.
Option 2: Union types for field aliases
Avro supports union types to accommodate both old and new field names during a transition:
{
"name": "customer_id",
"type": ["null", "string"],
"default": null,
"aliases": ["user_id"]
}Consumers check both customer_id and user_id. Producers can migrate the field name over multiple deploys. This is messier but avoids the dual-topic complexity.
Protobuf as an Alternative
Protobuf handles field evolution differently—field numbers, not names, are the identity:
syntax = "proto3";
message OrderEvent {
string order_id = 1;
string customer_id = 2;
double total_amount = 3;
string currency = 4; // new field — safe to add
}Adding field 4 is always backward compatible in Protobuf as long as the field number wasn't previously used. Renaming customer_id to buyer_id is safe—consumers on old code still see field number 2, just under the old name. This makes Protobuf somewhat more forgiving for rename-heavy schemas.
Key Takeaways
- Schema Registry enforces compatibility at produce time—breaking changes are rejected before they reach the topic, not discovered by a consumer crash at 3 AM.
- BACKWARD compatibility is consumer-centric; FORWARD is producer-centric—most teams should use FULL or FULL_TRANSITIVE for shared, long-lived topics.
- FULL_TRANSITIVE is the right choice when any consumer replays from the beginning—backfill jobs and audit systems read every historical schema version; transitive checking catches compatibility breaks against old schemas that FULL misses.
- Adding a field without a default is a breaking change in Avro—every field addition must include
"default": nullor a sensible default; there are no exceptions. - Renaming a field in Avro is remove + add, not a rename—use union types with aliases during a transition window, or migrate to a new topic.
- Protobuf field numbers are the identity, not names—renames are free in Protobuf; this makes it more resilient to schema churn than Avro for rapidly evolving data models.