Zip folder in python with optimization of memory usage

59 views Asked by At

Is there a way to decrease memory usage while compressing (creating a zip file ) a folder with several files? The reason is that in large files it can consume a lot of memory. if there is a way to make performance better too I would love to know too

import os
import zipfile
import errnodef 

zip_folder(
    zip_source_directory_target,
    source_folder_to_zip_path,
    zip_file_name,
    logger: Logger,
    compresslevel=6,
):
    """
    do compression on a folder
    """
    if os.path.exists(source_folder_to_zip_path) and os.path.exists(
        zip_source_directory_target
    ):
        full_zip_directory = os.path.join(zip_source_directory_target, zip_file_name)
        with zipfile.ZipFile(
            full_zip_directory,
            "w",
            compression=zipfile.ZIP_DEFLATED,
            compresslevel=compresslevel,
        ) as zipf:
            with os.scandir(source_folder_to_zip_path) as entries:
              for entry in entries:
                if entry.is_file():
                    arcname = os.path.relpath(entry.path, source_folder_to_zip_path)
                    zipf.write(entry.path, arcname=arcname)
    else:
        logger.warn(f"Zip source path: {zip_source_directory_target}")
        raise FileNotFoundError(
            errno.ENOENT,
            os.strerror(errno.ENOENT),
            f"Data folder path or zip source path may not exist please check",
        )

Edit: tnx to Mark Adler the solution is working (the one above) the issue the second time was due to a false memory check with docker stats - an issue with the docker version

1

There are 1 answers

4
Mark Adler On BEST ANSWER

Use os.scandir instead of os.walk. Your use of os.walk loads the entire directory tree into memory, and then iterates over it. os.scandir will return an iterator that steps through the tree without loading it all into memory.