How to zip objects in an object storage

368 views Asked by At

How would you go about organizing a process of zipping objects that reside an object storage?

For context, our users sometimes request an extraction of their entire data from the app - think of "Downloading Twitter archive" feature of Twitter.

Our users are able to upload files, so the extracted data must contain files stored in a object storage (Google Cloud Storage). The requested data must be packed into a single .zip archive.

A naive approach would look like this:

  1. download all files from object storage on a disk,
  2. zip all files into an archive,
  3. put it .zip back on an object storage,
  4. send a link to download the .zip file back to user.

However, there are multiple disadvantages here:

  1. sometimes files for even single user add up to gigabytes,
  2. if the process of zipping is interrupted, it has to start over.

What's a reasonable way to design a process of generating a .zip archive with user files, that originally reside on an object storage?

1

There are 1 answers

0
John Hanley On BEST ANSWER

Unfortunately, your naive approach is the only way because Cloud Storage offers no compute abilities. Archiving files requires compute, memory, and temporary storage.

The key item is to choose a service, such as Compute Engine, that can meet your file processing requirements: multi-gig files, fast processing (compression), and high-speed networking.

Another issue will be the time that it takes to download, zip, and upload. That means using an asynchronous event-based design. Start file processing and notify the user (email, message, web inbox, etc) once the file processing is complete.

You could make the process synchronous and display a progress bar, but that will complicate the design.