how to enable schema evolution on autoloader

Question

how to enable schema evolution on autoloader

59 views Asked by bruce shavhani At 22 February 2024 at 15:29

I want to do schemaEvolution on autoloader in order to addNewColumns as they are arrived and ingested

When I click display the Stream supposed to fail and tell me that there are unknown columns

Its supposed to give me this error on this picture

Original Q&A

There are 1 answers

**DileeprajnarayanThumula** · Answer 1 · 2024-02-23T12:17:29+00:00

The schema location directory keeps track of your data schema over time

Know more about Configure schema inference and evolution in Auto Loader

When you specify a target directory for the option cloudFiles.schemaLocation, it enables schema inference and evolution.

You can use the same directory for checkpointLocation if you prefer.

The following is the syntax:

(spark.readStream.format("cloudFiles")
  .option("cloudFiles.format", "parquet")
  .option("cloudFiles.schemaLocation", "<path-to-checkpoint>")
  .load("<path-to-source-data>")
  .writeStream
  .option("checkpointLocation", "<path-to-checkpoint>")
  .start("<path_to_target")
)

Results:

   df = (spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .option("cloudFiles.schemaLocation", schema_loc)
      .load(Source_data_loc)
      .writeStream
      .option("checkpointLocation", schema_loc)
      .start(target_data_loc))

enter image description here

TechQA.

how to enable schema evolution on autoloader

There are 1 answers

Related Questions in AZURE-DATABRICKS

Related Questions in AUTOLOAD

Popular Questions

Trending Questions