Avro: Keep backwards capability using default value in new field without using writer/old schema

2.3k views Asked by At

AVRO can handle most common forwards and backwards compatibility. But it somehow needs the writer schema when reading the data using a newer schema.

Saying there is a old schema:

{
  "type": "record",
  "name": "com.test.Contact",
  "fields": [
    {
      "name": "address",
      "type": "string"
    }
  ]
}

If using the following new schema to decode bytes that written by old schema, it is needed to have both old and new schema.

{
  "type": "record",
  "name": "com.test.Contact",
  "fields": [
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "phone",
      "type": ["null", "string"],
      "default": null
    }
  ]
}

The code to read

    static void sede3() throws IOException {
        System.out.println("----------------------sede3---------------------");
        Schema.Parser parserNew = new Schema.Parser();
        Schema.Parser parserOld = new Schema.Parser();
        Schema addrSchemaNew = parserNew.parse(AppMainCore.class.getClassLoader()
                .getResourceAsStream("compatibility-new.schema.json"));
        Schema addrSchemaOld = parserOld.parse(AppMainCore.class.getClassLoader()
                .getResourceAsStream("compatibility-old.schema.json"));

        GenericRecord addressOld = new GenericData.Record(addrSchemaOld);
        addressOld.put("address", "ABCDEF");
        // Generate bytes using old schema.
        ByteArrayOutputStream bbos = new ByteArrayOutputStream();
        Encoder encoder = EncoderFactory.get().binaryEncoder(bbos, null);
        DatumWriter<GenericRecord> writer = new GenericDatumWriter<>(addrSchemaOld);
        writer.write(addressOld, encoder);
        encoder.flush();
        bbos.close();
        byte[] oldBytes = bbos.toByteArray();
        System.out.println(Hex.encodeHexString(oldBytes) + " is old record bytes");
        // Try to deserialize old bytes using new schema, with old schema's help
        DatumReader<GenericRecord> readerGood = new GenericDatumReader<>(addrSchemaOld, addrSchemaNew);
        Decoder decoderGood = DecoderFactory.get().binaryDecoder(oldBytes, null);
        GenericRecord goodAddress = readerGood.read(null, decoderGood);

        System.out.println(goodAddress + " is record from old bytes using new schema");
    }

What I am trying is deserializing old bytes using new schema, WITHOUT old schema's help.

like:

DatumReader<GenericRecord> readerGood = new GenericDatumReader<>(addrSchemaNew);

But not

DatumReader<GenericRecord> readerGood = new GenericDatumReader<>(addrSchemaOld, addrSchemaNew);

Is it supported in Avro?

1

There are 1 answers

0
s4nk On

I don't think that's possible, see documentation here: https://avro.apache.org/docs/1.8.2/spec.html#Schema+Resolution. Avro needs both old and new schema to perform the conversion.