Calling Python UDF function for multiple files

61 views Asked by At

How can we effectively call a Python UDF function for multiple CSV files in S3 stage? I have like ~450K CSV files (each in size of few KBs) coming in daily and I need to select only certain columns from each file and load it in table. I'm using a UDF to read the header and select only required columns. Right now it's taking ~10 mins to read and load the file. Is there any optimization technique available that can speed this process?

0

There are 0 answers