Spark - Read csv file with quotes

46 views Asked by At

We have the following in a csv file:

DI_SEQUENCE_NUMBER;DI_OPERATION_TYPE;MANDT;SPRAS;MVGR2;BEZEI;SRC_SYSTEM;LOAD_DATE;ODQ_CHANGEMODE;ODQ_ENTITYCNTR
76;"I";"220";"E";"200";"SAUGNAPF:BX75P:30/60M.FILT;G3/8""AU1/8"";"D16";"20231017T114626";"C";1
77;"I";"220";"E";"201";"COMPOTS AND FRUIT DRINKS";"D16";"20231017T114626";"C";1

Row 76 is causing the problem, we can't find the right options to read correctly. The part that's causing the issue is "SAUGNAPF:BX75P:30/60M.FILT;G3/8""AU1/8"". The original text from system were this csv file is generated is SAUGNAPF:BX75P:30/60M.FILT;G3/8""AU1/8".

We tried with: option("quote", """) option("escape",""")

but no luck

We tried with: option("quote", """) option("escape",""")

We expect to have this as an output

1

2

There are 2 answers

0
Rushikesh On

quote and escape options don't seem to be working in this case. I read the data without any option which show column value like:

"SAUGNAPF:BX75P:30/60M.FILT;G3/8""AU1/8""

To remove opening and closing quotes I used regexp_replace.

spark.read.csv(sep=";", header=True, path=<file_path>) \
     .withColumn("BEZEI", f.regexp_replace("BEZEI", "^[\"]|[\"]$", ''))
0
Ali BOUHLEL On

I think that you need to escape the double quote when setting the options:

.option("quote", "\"")
.option("escape", "\"")

If you have data with multiple lines then you'll need to add .option("multiline", True)

If you have an extra space between field values then you should also add

.option("ignoreLeadingWhiteSpace", true)

I hope it'll work for you.