s3fs.put into empty and non-empty S3 folder

201 views Asked by At

I am copying folder to S3 with s3fs.put(..., recursive=True) and I experience weird behavior. The code is:

import s3fs    

source_path = 'foo/bar'                # there are some <files and subfolders> inside
target_path = 'S3://my_bucket/baz'

s3 = s3fs.S3FileSystem(anon=False)
s3.put(source_path, target_path, recursive=True)

First time I run the command (files and S3 "folders" are created), results end up like this:

S3://my_bucket/baz/<files and subfolders>

Second time I run the command, the result looks like this

S3://my_bucket/baz/bar/<files and subfolders>

I can probably check existence of the "folders" before, but that does not solve the problem that I do not want to see bar in the resulting tree structure. I tried to append '/' to target_path in line with the documentation, but it did not have any effect. Is there a way to force s3fs behave same way regardless of existing data in S3?

1

There are 1 answers

0
pandidand On

Yes, this is unfortunatelly, the functionality of shutil.copy_tree(src, dst, dirs_exist_ok=True) is missing here for s3fs.put().

But the working solution for this is to put a trailing slash on the source path:

import s3fs  
import os  

source_path = 'foo/bar'                # there are some <files and subfolders> inside
target_path = 'S3://my_bucket/baz'

s3 = s3fs.S3FileSystem(anon=False)
s3.put(os.path.join(source_path, ""), target_path, recursive=True)

Tested it on my s3 deployment (mind the trailing slash in the second test):

>>> s3.put("~/some_dir_with_subdirs", "s3://my_bucket/target_dir")
>>> s3.ls("s3://my_bucket/target_dir")
['dir_1', 'file_1', 'file_2', 'file_3', 'file_4', 'file_5', 'dir_2']
>>> s3.put("~/some_dir_with_subdirs", "s3://my_bucket/target_dir")
>>> s3.ls("s3://my_bucket/target_dir")
['dir_1', 'target_dir', 'file_1', 'file_2', 'file_3', 'file_4', 'file_5', 'dir_2']

>>> s3.put("~/some_dir_with_subdirs/", "s3://my_bucket/target_dir")
>>> s3.ls("s3://my_bucket/target_dir")
['dir_1', 'file_1', 'file_2', 'file_3', 'file_4', 'file_5', 'dir_2']
>>> s3.put("~/some_dir_with_subdirs/", "s3://my_bucket/target_dir")
>>> s3.ls("s3://my_bucket/target_dir")
['dir_1', 'file_1', 'file_2', 'file_3', 'file_4', 'file_5', 'dir_2']