aiohttp: fast parallel downloading of large files

1.6k views Asked by At

I'm using aiohttp to download large files (~150MB-200MB each).

Currently I'm doing for each file:

async def download_file(session: aiohttp.ClientSession, url: str, dest: str):
    chunk_size = 16384
    async with session.get(url) as response:
        async with aiofiles.open(dest, mode="wb") as f:
            async for data in response.content.iter_chunked(chunk_size):
                await f.write(data)

I create multiple tasks of this coroutine to achieve concurrency. I'm wondering:

  1. What is the best value for chunk_size?
  2. Is calling iter_chunked(chunk_size) is better then just doing data = await response.read() and writing that to disk? In that case, how can I report the download progress?
  3. How many tasks made of this coroutine should I create?
  4. Is there a way to download multiple parts of the same file in parallel, is it something that aiohttp already does?
1

There are 1 answers

1
Sören Rifé On
  1. Selection of chunk size depends upon what you want in your RAM. If you have a RAM of 4 GB then a chunk size of 512 MB or 1 GB is okay. But if you have a RAM of 1 GB, then you probably don't want a chunk size of 1 GB. So you should set your chunk_size in accordance to available memory.

  2. You should create as much tasks as downloaded files in parallel you want to process. That's totally on you and your use case.

  3. It does not process internally reading the files in bunches. But what you could do, is to fetch a HEAD request to the server asking for the file's Content-Length, subdivide the file size, ask for each part to the server in parallel and then merge it yourself.