Error in spark Dataset processing when using a scalapb generated class wrapped inside vanilla scala case class

43 views Asked by At

I am trying to wrap a scalapb generated class inside a scala defined case class and then use it while doing some spark processing. Is it possible?

Here is some schematic code of what I am trying to do

//Defined in proto
message Record{
   optional some_field = 1;
}

//Define in scala object
case class RecordWrapper(id: String, record: Record)

//Spark code

val schema = ProtoSQL
      .schemaFor[Record]
      .asInstanceOf[StructType]

val ds = spark.read.format("parquet").schema(schema)
      .load(dataPath)
      .as[Record]

ds.map(x => someProcessing(RecordWrapper("123", x)).show(false)

Right now I am getting an error like: Task not serializable: java.io.NotSerializableException: scalapb.descriptors.FieldDescriptor Serialization stack: - object not serializable (class: scalapb.descriptors.FieldDescriptor, value: com.apple.maps.poi.aiml.infra.protos.PlaceDetails.categories)

0

There are 0 answers