如何在 PyTorch 多处理中共享张量列表?

How to share a list of tensors in PyTorch multiprocessing?

我正在使用 PyTorch 多处理编程。我希望所有子进程都可以 read/write 相同的张量列表(不调整大小)。例如变量可以是

m = list(torch.randn(3), torch.randn(5))

因为每个张量的大小不同,我无法将它们组织成一个张量。

一个python列表没有share_memory_()函数,multiprocessing.Manager不能处理张量列表。如何在多个子进程之间共享变量 m?

我自己找到解决办法。这非常简单。只需为每个列表元素调用 share_memory_() 。列表本身不在共享内存中,但列表元素在。

演示代码

import torch.multiprocessing as mp
import torch

def foo(worker,tl):
    tl[worker] += (worker+1) * 1000

if __name__ == '__main__':
    tl = [torch.randn(2), torch.randn(3)]

    for t in tl:
        t.share_memory_()

    print("before mp: tl=")
    print(tl)

    p0 = mp.Process(target=foo, args=(0, tl))
    p1 = mp.Process(target=foo, args=(1, tl))
    p0.start()
    p1.start()
    p0.join()
    p1.join()

    print("after mp: tl=")
    print(tl)

输出

before mp: tl=
[
 1.5999
 2.2733
[torch.FloatTensor of size 2]
, 
 0.0586
 0.6377
-0.9631
[torch.FloatTensor of size 3]
]
after mp: tl=
[
 1001.5999
 1002.2733
[torch.FloatTensor of size 2]
, 
 2000.0586
 2000.6377
 1999.0370
[torch.FloatTensor of size 3]
]

@rozyang 给出的原始答案不适用于 GPU。它会引发类似 RuntimeError: CUDA error: initialization error CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

的错误

要修复它,请将 mp.set_start_method('spawn', force=True) 添加到代码中。以下是一个片段:

import torch.multiprocessing as mp
import torch

def foo(worker,tl):
    tl[worker] += (worker+1) * 1000

if __name__ == '__main__':
    mp.set_start_method('spawn', force=True)
    tl = [torch.randn(2, device='cuda:0'), torch.randn(3, device='cuda:0')]

    for t in tl:
        t.share_memory_()

    print("before mp: tl=")
    print(tl)

    p0 = mp.Process(target=foo, args=(0, tl))
    p1 = mp.Process(target=foo, args=(1, tl))
    p0.start()
    p1.start()
    p0.join()
    p1.join()

    print("after mp: tl=")
    print(tl)