How do you evolve a data schema without breaking downstream ML consumers?
Use a schema registry with backward-compatible evolution rules so changes are managed rather than ad hoc: producers can add optional or nullable fields and consumers ignore unknown fields, which keeps existing pipelines working. Breaking changes such as renaming, removing, or retyping a field require versioning, often a new topic or table, with a migration window and deprecation before the old schema is retired. This lets data evolve continuously while ML features and models stay stable.
How to think about it
The short answer
Manage schema change through a schema registry with backward-compatible evolution rules. Non-breaking changes (add an optional/nullable field; consumers ignore unknown fields) flow freely. Breaking changes (rename, remove, or retype a field) require versioning — often a new topic/table — with a migration window and deprecation before the old schema is retired.
Why backward compatibility is the dividing line
Downstream ML consumers compute features off specific fields. If a producer removes or retypes one without warning, training and serving silently break. Backward compatibility means an old consumer can still read new data, so producers can iterate without coordinating a lockstep deploy with every consumer. As data-contract practice describes, the registry enforces these rules so evolution is managed, not ad hoc.
Safe vs breaking changes
- Safe (backward-compatible): add a nullable field, add an enum value consumers can default, widen a type cautiously. Consumers ignore unknown fields and keep working.
- Breaking: rename, delete, retype, change units/semantics, or make an optional field required. These need a new schema version and a migration path.
How to roll out a breaking change
- Publish the new version alongside the old (e.g., new topic
events.v2). - Dual-write or run both during a migration window.
- Migrate consumers (retrain features against v2).
- Deprecate then retire v1 only after consumers move.
Concrete example
You must split name into first_name/last_name. Instead of mutating in place, add the new fields (nullable) first, backfill, let feature pipelines adopt them, then deprecate name on a schedule. No model ever trains on a half-migrated field.
Common follow-up / trap
A classic probe: “A producer needs to change a field’s units — how?” That’s a breaking semantic change even if the type is unchanged; it needs a new version, not an in-place edit, because the old type validates while the meaning silently shifts. The trap is assuming “same type = safe” — semantic changes are the sneakiest breakers, which is exactly why contracts pin meaning, not just types.