Decimal conversion using AvroSchema/Apache Beam for Parquet file is failing

71 views Asked by At

I am trying to convert a Parquet file, after reading using it, to Beam Rows using AvroUtils and PCollection. It is failing with ClassCastException to convert a Dcimal (38, 0) column.

org.apache.beam.sdk.Pipeline$PipelineExecutionException: class org.apache.avro.generic.GenericData$Fixed cannot be cast to class java.nio.ByteBuffer (org.apache.avro.generic.GenericData$Fixed is in unnamed module of loader 'app'; java.nio.ByteBuffer is in module java.base of loader 'bootstrap'

Below is the code used for for this data read operation.

Configuration conf = new Configuration();
conf.setBoolean(AvroReadSupport.READ_INT96_AS_FIXED, true); //Without this config, unable to read the Parquet file with Decimal(38, 0) column.

ParquetMetadata metadata = new ParquetFileReader.readFooter(conf, new Path(readFile.getPath()), ParquetMediaConverter.NO_FILTER);
AvroSchemaConverter converter = new AvroSchemaConverter(conf);
org.apache.avroSchema schema = converter.convert(messageType);

PCollection<GenericRecord> records = pipeline.apply("Read parquet file", ParquetIO.read(schema).from(readFile.getPath()).withConfiguration(conf));

PCollection<Row> rows = records.apply("Convert to Generic Records to Apache Beam Rows", ParDo.of(new DoFn<GenericRecord, Row>() {
       @ProcessElement
       public void processElement(@Element GenericRecord record, OutputReceiver<Row> out) {
             out.output(AvroUtils.toBeamRowStrict(record, AvroUtils.toBeamSchema(schema)));
       }
}));
rows.setRowSchema(AvroUtils.toBeamSchema(schema));
readValuesMap.put("output", rows);

Please help me to find a way to convert a file with Decimal(38, 0) column to Beam Rows.

0

There are 0 answers