Using avro for key subject with kafka schema registry

112 views Asked by At

My team and I recently had issues with Avro schema used for topic keys. We changed a comment on the key, which completely broke our Kafka Streams joins, and also broke the compaction of our topics.

After investigation, it seems that changing a comment on a record actually creates a new version of the subject. For instance, given a topic my.awesome.topic.snapshot, using AwesomeKey and AwesomeValue as records, adding a comment on AwesomeKey result in the creation of a new version of the subject my.awesome.topic.snapshot-key, which break Kafka streams joins and aggregates, as well as compaction.

My understanding of the issue is that it is not a bug with the broker, nor the schema registry or the stream DSL. The key is not deserialized for those use cases, by design it's just the bytes that are compared. Even tho the keys would be the same if deserialized, because the id of the schema (stored in the Magic Byte) is different, the keys are considered different.

Are we condemned to always create a new topic every time we want to change a comment? Should we completely avoid Avro for topic keys? Is there a way to tell the schema registry to block any change to a subject, even those fully compatible (like comment change)? I'm looking for solutions, so this issue never happens again. We use avdl format and Java, if that changes anything.

1

There are 1 answers

1
Chris Gerken On

Can you instead put the key in the message header, probably as one key/value pair for ease of access, and then create a second Avro class that contains just the values of the original key that make up the actual compound key. Use that key-fields-only class as the message key. That lets you add as many fields to your original Avro class as you want over time while not affecting the (compound) key. You'd probably also need an public access class with produce and consume methods to enforce and encapsulate the logic needed to support your new use of two Avro classes.

edit

To answer another of your questions, while the use of Avro does result in key classes that are a bit more fragile than other approaches, the real problem here is the presence of fields in the key value that are not parts of the compound key. Even if you switched from Avro to something else, you'll still need to address the presence of non-key fields in your Kafka key.