Python: how to handle uploading and zipping large files into Minio

36 views Asked by At

I have a Django GraphQL API. I'm trying to implement an endpoint that would download files from external urls, upload them to Minio bucket, create a Zip-file from the files and return the Zip-file to the user for download. All this is done in the backend. It works with relative small files but the problem is that some of the files I need to handle are pretty large, e.g. ~4GB and uploading files this big seems to fail pretty often and adding them to Zip-file usually freezes the whole server.

This is how I download and upload the files:

def download_and_upload_file(url: str, prefix: str, file_name: str) -> str:
        with requests.get(url, stream=True) as response:
            response.raise_for_status()
            with io.BytesIO() as file_buffer:
                for chunk in response.iter_content(chunk_size=8192):
                    if chunk:
                        file_buffer.write(chunk)
                file_buffer.seek(0)
                upload_url = upload_object_to_minio(file_buffer, prefix, file_name)
                return upload_url

def upload_object_to_minio(stream: io.BytesIO, prefix: str, object_name: str):
    try:
        _ = MINIO_CLIENT.put_object(
            bucket_name=xxx,
            object_name=f"/{prefix}/{object_name}",
            data=stream,
            length=stream.getbuffer().nbytes,
        )
    except Exception as e:
        print(e)

    return "minio-object-url"

And this is how I zip the file:

def create_zip_file(prefix: str, uploaded_urls: list[str]) -> str:
        try:
            zip_buffer = io.BytesIO()
            with zipfile.ZipFile(zip_buffer, "w") as zip_file:
                for uploaded_url in uploaded_urls:
                    file_name = uploaded_url.split("/")[-1]

                    # Retrieve the Minio object and its name
                    minio_object, object_name = get_object_from_minio(uploaded_url)

                    # Read the content of the Minio object
                    object_content = minio_object.read()

                    zip_file.writestr(file_name, object_content)

            zip_buffer.seek(0)

            zip_file_name = "name.zip"
            upload_object_to_minio(zip_buffer, prefix, zip_file_name)
            zip_url = get_minio_url(prefix, zip_file_name)
            return zip_url
        except Exception as e:
            print(e)

I'd be very grateful of any tips how to deal with those large files. Would it be better to handle the downloading and zipping in the frontend or is my approach good (download the files -> upload the to Minio -> get the files from Minio -> zip the files -> deliver the zip to the user)? Or is the whole approach of dealing with large files during API requests wrong? Any insights and suggestions are welcome.

This is the mutation I have using these functions:

@strawberry.type
class FileHandlerMutation:
    @strawberry.mutation()
    def download_initiate(
        self,
        info: Info
    ) -> Any:
        input_data = vars(input)
        id = input_data["id"]
        ...
        urls = input_data["urls"]

        try:            
            zip_url = download_and_upload_files(urls=urls, prefix=prefix)

   ....

download_and_upload_files uses download_and_upload_file to download files from given urls etc.

0

There are 0 answers