来自 运行 tensorflow 并行模型的错误,当顺序运行时它工作正常

Error from running tensorflow models in parallel, when sequentially it works fine

尝试使用 pathos.multiprocessing.Pool

并行使用多个 TensorFlow 模型

错误是:

multiprocess.pool.RemoteTraceback:

Traceback (most recent call last):
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\multiprocess\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\multiprocess\pool.py", line 44, in mapstar
    return list(map(*args))
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "c:\Users\Burge\Desktop\SwarmMemory\sim.py", line 38, in run
    i.step()
  File "c:\Users\Burge\Desktop\SwarmMemory\agent.py", line 240, in step
    output = self.ai(np.array(self.internal_log).reshape(-1, 1, 9))
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1012, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\tensorflow\python\keras\engine\sequential.py", line 375, in call
    return super(Sequential, self).call(inputs, training=training, mask=mask)
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 425, in call
    inputs, training=training, mask=mask)
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 569, in _run_internal_graph
    assert x_id in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output KerasTensor(type_spec=TensorSpec(shape=(None, 1, 4), dtype=tf.float32, name=None), name='dense_1/BiasAdd:0', description="created by layer 'dense_1'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\Burge\Desktop\SwarmMemory\sim.py", line 78, in <module>
    p.map(Sim.run, sims)
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\pathos\multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\multiprocess\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\burge\appdata\local\programs\python\python37\lib\site-packages\multiprocess\pool.py", line 657, in get
    raise self._value
AssertionError: Could not compute output KerasTensor(type_spec=TensorSpec(shape=(None, 1, 4), dtype=tf.float32, name=None), name='dense_1/BiasAdd:0', description="created by layer 'dense_1'")

池的创建如下:

if __name__ == '__main__':
    freeze_support()

    model = Sequential()
    model.add(Input(shape=(1,9)))
    model.add(LSTM(10, return_sequences=True))
    model.add(Dropout(0.1))
    model.add(LSTM(5))
    model.add(Dropout(0.1))
    model.add(Dense(4))
    model.add(Dense(4))

    models = []
    sims = []

    for i in range(6):
        models.append(tensorflow.keras.models.clone_model(model))
        sims.append(Sim(models[-1]))
    
    p = Pool()
    p.map(Sim.run, sims)

基本上,我正在 运行 使用提供给 class sim 的模型进行模拟。这意味着在 sim 具有 运行 我可以在结果上使用适应度函数,并对结果应用遗传算法。

GitHub link 了解更多信息,在分支 python-ver 下: https://github.com/HarryBurge/SwarmMemory

编辑: 以防将来有人需要知道如何执行此操作。 我使用 keras-pickle-wrapper 来腌制 keras 模型并将其传递给 运行 方法。

models = []
sims = []

for i in range(6):
      models.append(KerasPickleWrapper(tensorflow.keras.models.clone_model(model)))
      sims.append(Sim())
    
p = Pool()
p.map(Sim.run, sims, models)

我是 pathos 的作者。每当您在错误中看到 self._value 时,通常发生的情况是您尝试发送到另一个处理器的内容未能序列化。诚然,错误和回溯有点迟钝。但是,您可以使用 dill 检查序列化,并确定是否需要使用其中一种序列化变体(如 dill.settings['trace'] = True),或者是否需要稍微重构代码以更好地适应序列化。如果您正在使用的 class 是您可以编辑的东西,那么一个简单的事情就是添加一个 __reduce__ 方法或类似的方法来帮助序列化。