I want to load .csv.zst into a dataframe:
for ex in examples:
path = root + "f=" + ex + "/" + date
data = os.listdir(path)
for d in data:
zst_datapath = path + "/" + d
with open(zst_datapath, 'rb') as fh:
data = fh.read()
dctx = zstd.ZstdDecompressor(max_window_size=2147483648)
decompressed = dctx.decompress(data)
What I want do is read the decompressed file as csv file:
with open(decompressed, 'rb') as f:
csv_data = f.read()
csv = pd.read_csv(csv_data)
However, I get a File name too long error. How do I load the decompressed data into pandas dataframe?
Your main problem is that after going:
The variable
decompressnow contains the whole un-compressed data (so the content itself of the csv.zst. And then when you do:You are trying to open a file whose name is "{content of your csv}".
What you are thinking about is making an input stream of the decompressed data. Module io's StringIO is what you would be looking for. You pass it a text content, you get a file-like object that works as if it was coming from a file opened with
open():Except that, THIS WILL crash too, because
read_csv()is considering strings as being a "path", so again it will be looking a file whose name is "{content of your csv}".If you want to pass a block of text to csv_read, you need to pass the f object itself:
This will work, EXCEPT THAT, read _csv can also decompress files. So with recent pandas you can actually completely skip the whole "decompression" part, and directly give the file name. Pandas will take care of decompressing:
note that different compression scheme requires different dependencies to be installed to work.
Hope that this helps.