For more setup, see this question. I want to create lots of instances of class Toy, in parallel. Then I want to write them to an xml tree.
import itertools
import pandas as pd
import lxml.etree as et
import numpy as np
import sys
import multiprocessing as mp
def make_toys(df):
    l = []
    for index, row in df.iterrows():
        toys = [Toy(row) for _ in range(row['number'])]
        l += [x for x in toys if x is not None]
    return l
class Toy(object):
    def __new__(cls, *args, **kwargs):
        if np.random.uniform() <= 1:
            return super(Toy, cls).__new__(cls, *args, **kwargs)
    def __init__(self, row):
        self.id = None
        self.type = row['type']
    def set_id(self, x):
        self.id = x
    def write(self, tree):
        et.SubElement(tree, "toy", attrib={'id': str(self.id), 'type': self.type})
if __name__ == "__main__":
    table = pd.DataFrame({
        'type': ['a', 'b', 'c', 'd'],
        'number': [5, 4, 3, 10]})
    n_cores = 2
    split_df = np.array_split(table, n_cores)
    p = mp.Pool(n_cores)
    pool_results = p.map(make_toys, split_df)
    p.close()
    p.join()
    l = [a for L in pool_results for a in L]
    box = et.Element("box")
    box_file = et.ElementTree(box)
    for i, toy in itertools.izip(range(len(l)), l):
        Toy.set_id(toy, i)
    [Toy.write(x, box) for x in l]
    box_file.write(sys.stdout, pretty_print=True)
This code runs beautifully. But I redefined the __new__ method to only have a random chance of instantiating a class. So if I set if np.random.uniform() < 0.5, I want to create half as many instances as I asked for, randomly determined. Doing this returns the following error:
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 380, in _handle_results
    task = get()
AttributeError: 'NoneType' object has no attribute '__dict__'
I don't know what this even means, or how to avoid it. If I do this process monolithically, as in l = make_toys(table), it runs well for any random chance.
Another solution
By the way, I know that this can be solved by leaving the __new__ method alone and instead rewriting make_toys() as
def make_toys(df):
    l = []
    for index, row in df.iterrows():
        prob = np.random.binomial(row['number'], 0.1)
        toys = [Toy(row) for _ in range(prob)]
        l += [x for x in toys if x is not None]
    return l
But I'm trying to learn about the error.
                        
I think you've uncovered a surprising "gotcha" caused by
Toyinstances becomingNoneas they are passed through the multiprocessing Pool's resultQueue.The
multiprocessing.PoolusesQueue.Queues to pass results from the subprocesses back to the main process.Per the docs:
While the actual serialization might be different, in spirit the pickling of an instance of
Toybecomes a stream of bytes such as this:Notice that the module and class of the object is mentioned in the stream of bytes:
__main__\nToy.The class itself is not pickled. There is only a reference to the name of the class.
When the stream of bytes is unpickled on the other side of the pipe,
Toy.__new__is called to instantiate a new instance ofToy. The new object's__dict__is then reconstituted using unpickled data from the byte stream. When the new object isNone, it has no__dict__attribute, and hence the AttributeError is raised.Thus, as a
Toyinstance is passed through theQueue, it might becomeNoneon the other side.I believe this is the reason why using
leads to
If you add logging to your script,
you will find that the
AttributeErroronly occurs after a logging message of the formNotice that the logging message comes from the MainProcess, not one of the PoolWorker processes. Since the
Returning Nonemessage comes fromToy.__new__, this shows thatToy.__new__was called by the main process. This corroborates the claim that unpickling is callingToy.__new__and transforming instances ofToyintoNone.The moral of the story is that for
Toyinstances to be passed through a multiprocessing Pool's Queue,Toy.__new__must always return an instance ofToy. And as you noted, the code can be fixed by instantiating only the desired number of Toys inmake_toys:By the way, it is non-standard to call instance methods with
Toy.write(x, box)whenxis an instance ofToy. The preferred way is to useSimilary, use
toy.set_id(i)instead ofToy.set_id(toy, i).