并行和条件：NoneType 对象没有属性 'dict'

Question

有关更多设置，请参阅。我想并行创建 class Toy 的许多实例。然后我想将它们写入 xml 树。

import itertools
import pandas as pd
import lxml.etree as et
import numpy as np
import sys
import multiprocessing as mp


def make_toys(df):
    l = []
    for index, row in df.iterrows():
        toys = [Toy(row) for _ in range(row['number'])]
        l += [x for x in toys if x is not None]
    return l


class Toy(object):
    def __new__(cls, *args, **kwargs):
        if np.random.uniform() <= 1:
            return super(Toy, cls).__new__(cls, *args, **kwargs)

    def __init__(self, row):
        self.id = None
        self.type = row['type']

    def set_id(self, x):
        self.id = x

    def write(self, tree):
        et.SubElement(tree, "toy", attrib={'id': str(self.id), 'type': self.type})


if __name__ == "__main__":
    table = pd.DataFrame({
        'type': ['a', 'b', 'c', 'd'],
        'number': [5, 4, 3, 10]})

    n_cores = 2
    split_df = np.array_split(table, n_cores)

    p = mp.Pool(n_cores)
    pool_results = p.map(make_toys, split_df)
    p.close()
    p.join()
    l = [a for L in pool_results for a in L]

    box = et.Element("box")
    box_file = et.ElementTree(box)

    for i, toy in itertools.izip(range(len(l)), l):
        Toy.set_id(toy, i)

    [Toy.write(x, box) for x in l]

    box_file.write(sys.stdout, pretty_print=True)

这段代码运行得很好。但是我重新定义了 __new__ 方法，使其只有随机机会实例化 class。因此，如果我设置 if np.random.uniform() < 0.5，我想创建的实例数是我要求的一半，随机确定。这样做 returns 出现以下错误：

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 380, in _handle_results
    task = get()
AttributeError: 'NoneType' object has no attribute '__dict__'

我什至不知道这意味着什么，或者如何避免它。如果我单独执行此过程，如 l = make_toys(table)，它会在任何随机机会下运行良好。

另一个解决方案

顺便说一句，我知道这可以通过单独保留 __new__ 方法而不是将 make_toys() 重写为

来解决

def make_toys(df):
    l = []
    for index, row in df.iterrows():
        prob = np.random.binomial(row['number'], 0.1)
        toys = [Toy(row) for _ in range(prob)]
        l += [x for x in toys if x is not None]
    return l

但我正在尝试了解错误。

Answer 1

我想你发现了一个由 Toy 引起的令人惊讶的 "gotcha" 实例在通过多处理池时变为 None 结果 Queue.

multiprocessing.Pool 使用 Queue.Queues 将结果从子进程传回主进程。

Per the docs:

When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe.

虽然实际序列化可能有所不同，但本质上 Toy 实例的 pickle 变成这样的字节流：

In [30]: import pickle

In [31]: pickle.dumps(Toy(table.iloc[0]))
Out[31]: "ccopy_reg\n_reconstructor\np0\n(c__main__\nToy\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'type'\np6\nS'a'\np7\nsS'id'\np8\nNsb."

注意在stream中提到了对象的module和class 字节：__main__\nToy。

class本身没有腌制。只有对 class.

名称的引用

当字节流在管道的另一端被 unpickle 时，Toy.__new__ 被调用 以实例化 Toy 的新实例。然后使用来自字节流的 unpickled 数据重构新对象的 __dict__。当新对象为 None 时，它没有 __dict__ 属性，因此会引发 AttributeError。

因此，当 Toy 实例通过 Queue 传递时，它可能会在另一侧变为 None。

我相信这就是使用

的原因

class Toy(object):
    def __new__(cls, *args, **kwargs):
        x = np.random.uniform() <= 0.5
        if x:
            return super(Toy, cls).__new__(cls, *args, **kwargs)
        logger.info('Returning None')

导致

AttributeError: 'NoneType' object has no attribute '__dict__'

如果您将日志添加到脚本中，

import itertools
import pandas as pd
import lxml.etree as et
import numpy as np
import sys
import multiprocessing as mp
import logging
logger = mp.log_to_stderr(logging.INFO)

def make_toys(df):
    result = []
    for index, row in df.iterrows():
        toys = [Toy(row) for _ in range(row['number'])]
        result += [x for x in toys if x is not None]
    return result


class Toy(object):
    def __new__(cls, *args, **kwargs):
        x = np.random.uniform() <= 0.97
        if x:
            return super(Toy, cls).__new__(cls, *args, **kwargs)
        logger.info('Returning None')

    def __init__(self, row):
        self.id = None
        self.type = row['type']

    def set_id(self, x):
        self.id = x

    def write(self, tree):
        et.SubElement(tree, "toy", attrib={'id': str(self.id), 'type': self.type})


if __name__ == "__main__":
    table = pd.DataFrame({
        'type': ['a', 'b', 'c', 'd'],
        'number': [5, 4, 3, 10]})

    n_cores = 2
    split_df = np.array_split(table, n_cores)

    p = mp.Pool(n_cores)
    pool_results = p.map(make_toys, split_df)
    p.close()
    p.join()
    l = [a for L in pool_results for a in L]

    box = et.Element("box")
    box_file = et.ElementTree(box)

    for i, toy in itertools.izip(range(len(l)), l):
        toy.set_id(i)

    for x in l:
        x.write(box)

    box_file.write(sys.stdout, pretty_print=True)

你会发现 AttributeError 只有在

形式的日志消息之后发生

[INFO/MainProcess] Returning None

请注意，日志消息来自 MainProcess，而不是其中之一 PoolWorker 进程。由于 Returning None 消息来自 Toy.__new__，这说明Toy.__new__被主进程调用了。这证实了 unpickling 正在调用的说法 Toy.__new__ 并将 Toy 的实例转换为 None.

故事的寓意是，对于要通过多处理池队列传递的 Toy 个实例，Toy.__new__ 必须始终 return 个实例 Toy。正如您所指出的，可以通过在 make_toys:

中仅实例化所需数量的玩具来修复代码

def make_toys(df):
    result = []
    for index, row in df.iterrows():
        prob = np.random.binomial(row['number'], 0.1)
        result.extend([Toy(row) for _ in range(prob)])
    return result

顺便说一句，用Toy.write(x, box)调用实例方法是不标准的当 x 是 Toy 的实例时。首选方法是使用

x.write(box)

同样，使用toy.set_id(i)代替Toy.set_id(toy, i)。

并行和条件：NoneType 对象没有属性 'dict'

Parallel and conditional: NoneType object has no attribute 'dict'

python

python-multiprocessing

另一个解决方案

并行和条件：NoneType 对象没有属性 '__dict__'

Parallel and conditional: NoneType object has no attribute '__dict__'

python

python-multiprocessing

另一个解决方案

并行和条件：NoneType 对象没有属性 'dict'

Parallel and conditional: NoneType object has no attribute 'dict'