NUMBA - 如何在带有 "cuda" 目标的@guvectorize 中生成随机数?
NUMBA - How to generate random numbers in @guvectorize with "cuda" target?
在这个(愚蠢的)示例中,我试图通过计算落入单位圆的 (0, 1) x (0, 1) 中随机选择的点的数量来计算 pi。
@guvectorize(['void(float64[:], int32, float64[:])'], '(n),()->(n)', target='cuda')
def guvec_compute_pi(arr, iters, res):
n = arr.shape[0]
for t in range(n):
inside = 0
for i in range(iters):
x = np.random.random()
y = np.random.random()
if x ** 2 + y ** 2 <= 1.0:
inside += 1
res[t] = 4.0 * inside / iters
编译时出现异常:
numba.errors.UntypedAttributeError: Failed at nopython (nopython frontend)
Unknown attribute 'random' of type Module(<module 'numpy.random' from '...'>)
File "scratch.py", line 34
[1] During: typing of get attribute at /.../scratch.py (34)
我天真地认为使用 here 描述的 RNG 可以解决问题。我修改后的代码如下所示:
@guvectorize(['void(float64[:], int32, float64[:])'], '(n),()->(n)', target='cuda')
def guvec_compute_pi(arr, iters, res):
n = arr.shape[0]
rng = create_xoroshiro128p_states(n, seed=1)
for t in range(n):
inside = 0
for i in range(iters):
x = xoroshiro128p_uniform_float64(rng, t)
y = xoroshiro128p_uniform_float64(rng, t)
if x ** 2 + y ** 2 <= 1.0:
inside += 1
res[t] = 4.0 * inside / iters
但是会弹出类似的错误:
numba.errors.TypingError: Failed at nopython (nopython frontend)
Untyped global name 'create_xoroshiro128p_states': cannot determine Numba type of <class 'function'>
File "scratch.py", line 28
当我尝试更改为 target='parallel'
时,无论 nopython=True
与否,使用 numpy.random.random
的原始代码都可以正常工作。是什么导致了 target='cuda'
的问题,有没有办法在 @guvectorize
-d 块中获取随机数?
函数 create_xoroshiro128p_states 旨在成为 CPU 上的 运行,如 Numba 文档中的示例所示,重复如下:
from __future__ import print_function, absolute_import
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states,
xoroshiro128p_uniform_float32
import numpy as np
@cuda.jit
def compute_pi(rng_states, iterations, out):
"""Find the maximum value in values and store in result[0]"""
thread_id = cuda.grid(1)
# Compute pi by drawing random (x, y) points and finding what
# fraction lie inside a unit circle
inside = 0
for i in range(iterations):
x = xoroshiro128p_uniform_float32(rng_states, thread_id)
y = xoroshiro128p_uniform_float32(rng_states, thread_id)
if x**2 + y**2 <= 1.0:
inside += 1
out[thread_id] = 4.0 * inside / iterations
threads_per_block = 64
blocks = 24
rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
out = np.zeros(threads_per_block * blocks, dtype=np.float32)
compute_pi[blocks, threads_per_block](rng_states, 10000, out)
print('pi:', out.mean())
它生成一个随机初始化数据数组,使 GPU 上的随机数生成独立于线程。该数据最终出现在设备端,这有点令人困惑。但它允许您将随机状态数据传递给您的 GPU 内核。
在这个(愚蠢的)示例中,我试图通过计算落入单位圆的 (0, 1) x (0, 1) 中随机选择的点的数量来计算 pi。
@guvectorize(['void(float64[:], int32, float64[:])'], '(n),()->(n)', target='cuda')
def guvec_compute_pi(arr, iters, res):
n = arr.shape[0]
for t in range(n):
inside = 0
for i in range(iters):
x = np.random.random()
y = np.random.random()
if x ** 2 + y ** 2 <= 1.0:
inside += 1
res[t] = 4.0 * inside / iters
编译时出现异常:
numba.errors.UntypedAttributeError: Failed at nopython (nopython frontend)
Unknown attribute 'random' of type Module(<module 'numpy.random' from '...'>)
File "scratch.py", line 34
[1] During: typing of get attribute at /.../scratch.py (34)
我天真地认为使用 here 描述的 RNG 可以解决问题。我修改后的代码如下所示:
@guvectorize(['void(float64[:], int32, float64[:])'], '(n),()->(n)', target='cuda')
def guvec_compute_pi(arr, iters, res):
n = arr.shape[0]
rng = create_xoroshiro128p_states(n, seed=1)
for t in range(n):
inside = 0
for i in range(iters):
x = xoroshiro128p_uniform_float64(rng, t)
y = xoroshiro128p_uniform_float64(rng, t)
if x ** 2 + y ** 2 <= 1.0:
inside += 1
res[t] = 4.0 * inside / iters
但是会弹出类似的错误:
numba.errors.TypingError: Failed at nopython (nopython frontend)
Untyped global name 'create_xoroshiro128p_states': cannot determine Numba type of <class 'function'>
File "scratch.py", line 28
当我尝试更改为 target='parallel'
时,无论 nopython=True
与否,使用 numpy.random.random
的原始代码都可以正常工作。是什么导致了 target='cuda'
的问题,有没有办法在 @guvectorize
-d 块中获取随机数?
函数 create_xoroshiro128p_states 旨在成为 CPU 上的 运行,如 Numba 文档中的示例所示,重复如下:
from __future__ import print_function, absolute_import
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states,
xoroshiro128p_uniform_float32
import numpy as np
@cuda.jit
def compute_pi(rng_states, iterations, out):
"""Find the maximum value in values and store in result[0]"""
thread_id = cuda.grid(1)
# Compute pi by drawing random (x, y) points and finding what
# fraction lie inside a unit circle
inside = 0
for i in range(iterations):
x = xoroshiro128p_uniform_float32(rng_states, thread_id)
y = xoroshiro128p_uniform_float32(rng_states, thread_id)
if x**2 + y**2 <= 1.0:
inside += 1
out[thread_id] = 4.0 * inside / iterations
threads_per_block = 64
blocks = 24
rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
out = np.zeros(threads_per_block * blocks, dtype=np.float32)
compute_pi[blocks, threads_per_block](rng_states, 10000, out)
print('pi:', out.mean())
它生成一个随机初始化数据数组,使 GPU 上的随机数生成独立于线程。该数据最终出现在设备端,这有点令人困惑。但它允许您将随机状态数据传递给您的 GPU 内核。