PyCUDA 的 curandom 中的内存分配顺序是否重要?
Does order of memory allocation matter in PyCUDA's curandom?
我使用 PyCUDA 的界面 [1] over CUDA Unified Memory [2]. At some point I added random number generators [3] 并盯着 Jupyter Notebook 中的死内核:
我将问题缩小到随机数生成器的创建。或者,准确地说,到我这样做的那一刻:
import pycuda.curandom
from pycuda import autoinit, driver
import numpy as np
gpu_data_1 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
gpu_generator = pycuda.curandom.XORWOWRandomNumberGenerator(pycuda.curandom.seed_getter_uniform)
gpu_data_2 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
上面的代码没有任何错误消息就失败了,但是如果我把 gpu_generator = ...
行 提高或降低一行,它似乎工作正常.
我相信 PyCUDA 可能无法执行 prepare
call,这归结为这个内核:
extern "C" {
__global__ void prepare(curandStateXORWOW *s, const int n,
unsigned int *v, const unsigned int o)
{
const int id = blockIdx.x*blockDim.x+threadIdx.x;
if (id < n)
curand_init(v[id], id, o, &s[id]);
}
}
知道可能是什么问题吗?
在 Pre-Pascal UM(统一内存)机制中,主机代码在内核启动后但在 cudaDeviceSynchronize()
发布之前触及托管分配是 illegal。
我猜这段代码违反了这条规则。如果我 运行 你在 Maxwell 系统上的重现情况,我会得到这个:
$ cuda-memcheck python ./idontthinkso.py
========= CUDA-MEMCHECK
========= Error: process didn't terminate successfully
========= Fatal UVM CPU fault due to invalid operation
========= during write access to address 0x703bc1000
=========
========= ERROR SUMMARY: 1 error
那是托管内存系统崩溃了。在随机生成器设置(运行 是一个内核)和 zeros 调用(涉及托管内存)之间放置一个同步调用在我的系统上摆脱了它:
$ cat idontthinkso.py
import pycuda.curandom
from pycuda import autoinit, driver
import numpy as np
gpu_data_1 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
gpu_generator = pycuda.curandom.XORWOWRandomNumberGenerator(pycuda.curandom.seed_getter_uniform)
autoinit.context.synchronize()
gpu_data_2 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
$ cuda-memcheck python ./idontthinkso.py
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
您所处的 UM 机制会因您使用的 GPU、驱动程序和 OS 而异。
我使用 PyCUDA 的界面 [1] over CUDA Unified Memory [2]. At some point I added random number generators [3] 并盯着 Jupyter Notebook 中的死内核:
我将问题缩小到随机数生成器的创建。或者,准确地说,到我这样做的那一刻:
import pycuda.curandom
from pycuda import autoinit, driver
import numpy as np
gpu_data_1 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
gpu_generator = pycuda.curandom.XORWOWRandomNumberGenerator(pycuda.curandom.seed_getter_uniform)
gpu_data_2 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
上面的代码没有任何错误消息就失败了,但是如果我把 gpu_generator = ...
行 提高或降低一行,它似乎工作正常.
我相信 PyCUDA 可能无法执行 prepare
call,这归结为这个内核:
extern "C" {
__global__ void prepare(curandStateXORWOW *s, const int n,
unsigned int *v, const unsigned int o)
{
const int id = blockIdx.x*blockDim.x+threadIdx.x;
if (id < n)
curand_init(v[id], id, o, &s[id]);
}
}
知道可能是什么问题吗?
在 Pre-Pascal UM(统一内存)机制中,主机代码在内核启动后但在 cudaDeviceSynchronize()
发布之前触及托管分配是 illegal。
我猜这段代码违反了这条规则。如果我 运行 你在 Maxwell 系统上的重现情况,我会得到这个:
$ cuda-memcheck python ./idontthinkso.py
========= CUDA-MEMCHECK
========= Error: process didn't terminate successfully
========= Fatal UVM CPU fault due to invalid operation
========= during write access to address 0x703bc1000
=========
========= ERROR SUMMARY: 1 error
那是托管内存系统崩溃了。在随机生成器设置(运行 是一个内核)和 zeros 调用(涉及托管内存)之间放置一个同步调用在我的系统上摆脱了它:
$ cat idontthinkso.py
import pycuda.curandom
from pycuda import autoinit, driver
import numpy as np
gpu_data_1 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
gpu_generator = pycuda.curandom.XORWOWRandomNumberGenerator(pycuda.curandom.seed_getter_uniform)
autoinit.context.synchronize()
gpu_data_2 = driver.managed_zeros(shape=5, dtype=np.int32, mem_flags=driver.mem_attach_flags.GLOBAL)
$ cuda-memcheck python ./idontthinkso.py
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
您所处的 UM 机制会因您使用的 GPU、驱动程序和 OS 而异。