I have a class object that uses pathos.multiprocessing to parallelize one of its more CPU expansive methods. To reduce overhead, I actually moved the method outside of the original class into a reduced class with a smaller data footprint (Whether this is a good design is another question which I posed here).
I tested this first on a Windows system and obtained some performance improvements, although I still assume serialization overhead to be a limiting factor.
However, when I tried the same on a Unix system, the parallelized version runs significantly slower than the sequential one. This surprises me, because although my understanding is limited, I was under the impression that efficient multiprocessing in Python is easier to achieve on a Unix system due to its ability to fork processes.
I would like to investigate what exactly causes this difference in performance for the two systems, but debugging multiprocessing programs has always struck me as a complicated task: To begin with, is there a way to obtain information on what exactly is passed (i.e. serialized and sent) to each subprocess during the execution of ProcessingPool.imap? This information could help me in further improving the design of my code.
I'm the author of
dill,multiprocess, andpathos. The easiest thing to do is to usedill.detect.tracewhich provides a trace of the path used to serialize any object.For example:
The
#denotes an object is completed, and an indentation is that a group of other objects are needed to serialize the current object. Thus, the serialization path is traced out.This also works for multiprocessing...