How to use cloud storage (blob stores) with Rocksdb

807 views Asked by At

My use case:

A single-node out-of-memory "big dict" (or "big map"). The total size is too large for memory, e.g. 20gb, but is ok for single-node disk. Due to the total size, it's unwieldy with single-file solution like SQLite. Also I want easy close backpacks, so want manageable file sizes. It needs to be a series of size-controllable files managed by the tool in a user transparent way. Further it should be embedded, ie, a simple lib, no client/server.

Long story short, I picked Rocksdb.

Now new requirements or nice-to-haves: I want to use a cloud blobstore as the ultimate storage. For example, a couple levels of hot caches reside in mo.eory or local disk with configurable total size; beyond that, go read/write to a cloud blob store.

After the initial creation of the dataset, the usage is mainly read. I don't care much about "distributed", multiple-machines competing-to-write that kind of complexities.

I don't see Rocksdb has this option.There's rocksdb-cloud that appears to be in "internal dev" mode---no end-user doc whatsoever.

Questions:

  1. Is my use case reasonable? Would a cloud kv store (like GCP Firestore?) plus a naive flat cache in memory going to have similar effect?

  2. How to do this with Rocksdb? Or any alternative?

Thanks.

2

There are 2 answers

1
Jay Zhuang On

RocksDB allows you to define your own FileSystem or Env, which you can implement the interaction layer with whatever special filesystem you want. So it's possible, but you need implement or define the integration layer with cloud kv store. (running on HDFS is an example, which defines it's own Env)

0
zpz On

Comment to @jayZhuang; too long as comment

This looks like the code is modular in decent ways, but can hardly say it "supports" cloud storage, because that needs forking and hacking the code itself. More reasonable to the end use would be extension or plugin from outside, basically "give me a few auth argents and the location of the storage and I do the rest". The few major blob stores should be a modest effort for this.

For me, I'm using Rocksdb from python via a hardly maintained Rocksdb python client. (There are no active options.) I have nice python utilities for cloud blobstore. I'm sure there's no way to let Rocksdb use that coming from python via an inactive Rocksdb python client package. Although I am able to do c++ extensions for python, that would need digging into both Rocksdb and blobstore in c++. It's not something I'll take on.

Thanks for the pointers. Do you know of any other examples closer to the end user?