用地图写入切片引用？

Question

我正在尝试写入本应通过引用传递给函数的 Python 切片。

def mpfunc(r):
    r[:]=1

R=np.zeros((2,4))

mpfunc(R[0])
mpfunc(R[1])
    
print(R)

此代码按预期工作。 R 现在包含 1。

当我使用map()时，然而

def mpfunc(r):
    r[:]=1

R=np.zeros((2,4))

map(mpfunc,R)
    
R

似乎 R 的切片不再通过引用传递，我从文档中不清楚这一点。 R 现在仍然是 0。

最终，目标是使用 multiprocessin.Pool.map()，不幸的是，由于同样的原因，它似乎失败了：

from multiprocessing import Pool

def mpfunc(r):
    r[:]=1

R=np.zeros((2,4))

with Pool(2) as p:
    p.map(mpfunc,R)
    
print(R)

为什么会这样？我该如何解决？

Answer 1

map (in Python 3) 是惰性的，你需要消耗它来触发函数，考虑下面的简单例子：

def update_dict(dct):
    dct.update({"x":1})
data = [{"x":0},{"x":0},{"x":0}]
mp = map(update_dict, data)
print(data)
lst = list(map(update_dict, data))
print(data)

输出

[{'x': 0}, {'x': 0}, {'x': 0}]
[{'x': 1}, {'x': 1}, {'x': 1}]

请记住，如果可能的话，您应该避免调用 map 来产生副作用，以免混淆处理这段代码的其他人。

Answer 2

由于您只是调用一个映射函数，它只是为您创建了一个生成器对象，而实际上它并没有完成调用。生成器是一种 Python 延迟或延迟执行的方式。所以这是您可以做到的方法之一。

    ...: def mpfunc(r):
    ...:    r[:]=1
    ...: 
    ...: R=np.zeros((2,4))
    ...: 
    ...: # mpfunc(R[0])
    ...: # mpfunc(R[1])
    ...: list(map(mpfunc, R))
    ...:
    ...: print(R)

只需通过创建列表或任何适合您的方法来使用地图功能。理想情况下，一个 next() 函数用于一个一个地使用它。

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]

同样适用于您的多进程代码段。

Answer 3

所以在非多重处理的情况下你必须迭代iterable由map函数返回以确保指定的函数已应用于所有传递的 iterables。但是 Pool.map.

却不是这样

但是你遇到的是一个更大的问题。您现在将数组传递给位于不同地址 space 中的进程，并且无法通过引用完成此操作，除非基本 numpy 数组存储在共享内存中。

在下面的代码中，每个进程的全局变量R 将使用numpy 数组的共享内存实现进行初始化。现在 map 函数将与需要更新的数组索引一起使用：

import multiprocessing as mp
import numpy as np
import ctypes

def to_numpy_array(shared_array, shape):
    '''Create a numpy array backed by a shared memory Array.'''
    arr = np.ctypeslib.as_array(shared_array)
    return arr.reshape(shape)

def to_shared_array(arr, ctype):
    shared_array = mp.Array(ctype, arr.size, lock=False)
    temp = np.frombuffer(shared_array, dtype=arr.dtype)
    temp[:] = arr.flatten(order='C')
    return shared_array

def init_worker(shared_array, shape):
    global R
    R = to_numpy_array(shared_array, shape)

def mpfunc(idx):
    R[idx, :] = 1


if __name__ == '__main__':
    R = np.zeros((2,4))
    shape = R.shape
    shared_array = to_shared_array(R, ctypes.c_int64)
    # you have to now use the shared array as the base
    R = to_numpy_array(shared_array, shape)

    with mp.Pool(2, initializer=init_worker, initargs=(shared_array, shape)) as p:
        p.map(mpfunc, range(shape[0]))

    print(R)

打印：

[[1 1 1 1]
 [1 1 1 1]]

用地图写入切片引用？

Write into slice references with map?

python

numpy

pass-by-reference

multiprocessing