如何在 PyTorch 多处理中共享张量列表?
How to share a list of tensors in PyTorch multiprocessing?
我正在使用 PyTorch 多处理编程。我希望所有子进程都可以 read/write 相同的张量列表(不调整大小)。例如变量可以是
m = list(torch.randn(3), torch.randn(5))
因为每个张量的大小不同,我无法将它们组织成一个张量。
一个python列表没有share_memory_()函数,multiprocessing.Manager不能处理张量列表。如何在多个子进程之间共享变量 m?
我自己找到解决办法。这非常简单。只需为每个列表元素调用 share_memory_() 。列表本身不在共享内存中,但列表元素在。
演示代码
import torch.multiprocessing as mp
import torch
def foo(worker,tl):
tl[worker] += (worker+1) * 1000
if __name__ == '__main__':
tl = [torch.randn(2), torch.randn(3)]
for t in tl:
t.share_memory_()
print("before mp: tl=")
print(tl)
p0 = mp.Process(target=foo, args=(0, tl))
p1 = mp.Process(target=foo, args=(1, tl))
p0.start()
p1.start()
p0.join()
p1.join()
print("after mp: tl=")
print(tl)
输出
before mp: tl=
[
1.5999
2.2733
[torch.FloatTensor of size 2]
,
0.0586
0.6377
-0.9631
[torch.FloatTensor of size 3]
]
after mp: tl=
[
1001.5999
1002.2733
[torch.FloatTensor of size 2]
,
2000.0586
2000.6377
1999.0370
[torch.FloatTensor of size 3]
]
@rozyang 给出的原始答案不适用于 GPU。它会引发类似 RuntimeError: CUDA error: initialization error CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
的错误
要修复它,请将 mp.set_start_method('spawn', force=True)
添加到代码中。以下是一个片段:
import torch.multiprocessing as mp
import torch
def foo(worker,tl):
tl[worker] += (worker+1) * 1000
if __name__ == '__main__':
mp.set_start_method('spawn', force=True)
tl = [torch.randn(2, device='cuda:0'), torch.randn(3, device='cuda:0')]
for t in tl:
t.share_memory_()
print("before mp: tl=")
print(tl)
p0 = mp.Process(target=foo, args=(0, tl))
p1 = mp.Process(target=foo, args=(1, tl))
p0.start()
p1.start()
p0.join()
p1.join()
print("after mp: tl=")
print(tl)
我正在使用 PyTorch 多处理编程。我希望所有子进程都可以 read/write 相同的张量列表(不调整大小)。例如变量可以是
m = list(torch.randn(3), torch.randn(5))
因为每个张量的大小不同,我无法将它们组织成一个张量。
一个python列表没有share_memory_()函数,multiprocessing.Manager不能处理张量列表。如何在多个子进程之间共享变量 m?
我自己找到解决办法。这非常简单。只需为每个列表元素调用 share_memory_() 。列表本身不在共享内存中,但列表元素在。
演示代码
import torch.multiprocessing as mp
import torch
def foo(worker,tl):
tl[worker] += (worker+1) * 1000
if __name__ == '__main__':
tl = [torch.randn(2), torch.randn(3)]
for t in tl:
t.share_memory_()
print("before mp: tl=")
print(tl)
p0 = mp.Process(target=foo, args=(0, tl))
p1 = mp.Process(target=foo, args=(1, tl))
p0.start()
p1.start()
p0.join()
p1.join()
print("after mp: tl=")
print(tl)
输出
before mp: tl=
[
1.5999
2.2733
[torch.FloatTensor of size 2]
,
0.0586
0.6377
-0.9631
[torch.FloatTensor of size 3]
]
after mp: tl=
[
1001.5999
1002.2733
[torch.FloatTensor of size 2]
,
2000.0586
2000.6377
1999.0370
[torch.FloatTensor of size 3]
]
@rozyang 给出的原始答案不适用于 GPU。它会引发类似 RuntimeError: CUDA error: initialization error CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
要修复它,请将 mp.set_start_method('spawn', force=True)
添加到代码中。以下是一个片段:
import torch.multiprocessing as mp
import torch
def foo(worker,tl):
tl[worker] += (worker+1) * 1000
if __name__ == '__main__':
mp.set_start_method('spawn', force=True)
tl = [torch.randn(2, device='cuda:0'), torch.randn(3, device='cuda:0')]
for t in tl:
t.share_memory_()
print("before mp: tl=")
print(tl)
p0 = mp.Process(target=foo, args=(0, tl))
p1 = mp.Process(target=foo, args=(1, tl))
p0.start()
p1.start()
p0.join()
p1.join()
print("after mp: tl=")
print(tl)