I have a Pyspark dataframe which I am writing to Glue catalog as below :
df.write.format("parquet").mode("append").saveAsTable('db.table')
This works fine if the input dataframe and Glue catalog have the same columns. But when new columns are added in dataframe, the Glue job fails.
Is there a way to update the schema in Glue catalog if new columns/schema changes are detected in incoming spark dataframe?
I tried using different modes (append/overwrite etc..) but that removes the existing data. Also, tried converting spark df to Dynamic frame and then update schema but that also seems to not work as the way expected.
Creating tables, updating the schema, and adding new partitions in the Data Catalog from AWS Glue ETL jobs