How to get shared global object between mappers in DISCO

142 views Asked by At

Each of my mappers need access to very large dictionary. Is there someway I can avoid the overhead of each mapper opening its own copy, and instead have all of them point to one global shared object?

Any suggestions specific to DISCO or in mapreduce paradigm would be helpful.

1

There are 1 answers

0
Jan Vlcinsky On

Use Redis key-value store

Can be installed quickly on Linux and Windows compiled versions are also available.

python redis package will then allow you to write, read and update values really easily.

Using hash data type is what will serve you best, you may add/edit new values to so called fields (key in Python dictionary terminology), it is very fast and it is also very straightforward.

This solution will work even for independent processes. You may even share data in Redis over network, so for map/reduce scenario this can be great option.

The only thing, you have to care about when storing and restoring values is that the values can be only strings, so you have to serialize and de-serialize them. json.dumps and json.loads works very well for this.