如何使用 Numba xoroshiro128p 创建简单的随机数组

How to create simple random array using Numba xoroshiro128p

我需要获取在 JIT 函数中使用 Numba xoroshiro128p 创建随机数组的简单示例。例如最终数组 shell by size(2,4)。 Link 对于 numba 文档 here

Pseudo code:

minimum = -2
maximum = 2

out_array = random(minimum, maximum, shape(2,4))

Output:
[[ 1.87569628  2.85881711  3.6009965   1.49224129]
 [-3.27321953  1.59090995 -4.66912864 -3.43071647]]

是否可以使用 cuda 比使用 numpy 更快地执行数组创建?例如:

minimum_bound = -1
maximum_bound = 1
vectors_number = 12000000
variable_number = 6

@jit
def random_matrix(vectors_number, variable_number):
    population_generator = np.random.uniform(minimum_bound, 
    maximum_bound, (vectors_number, variable_number))
    return population_generator

population_array = random_matrix(vectors_number, variable_number)

创建 1200000 个向量后,我获得的速度与在 cuda 上执行此操作的速度相同。

可以对 example in the documentation 进行微不足道的修改以执行您想要的操作

from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
import numpy as np

@cuda.jit
def rand_array(rng_states, out):
    thread_id = cuda.grid(1)
    x = xoroshiro128p_uniform_float32(rng_states, thread_id)
    out[thread_id] = x


threads_per_block = 4
blocks = 2 
rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
out = np.zeros(threads_per_block * blocks, dtype=np.float32)

rand_array[blocks, threads_per_block](rng_states, out)
print(out.reshape(blocks,threads_per_block))

灵感来自爪牙答案:

@cuda.jit

def random(threads_per_block, blocks):

    def rand_array(rng_states, out): # inside "def random"
        thread_id = cuda.grid(1)
        x = xoroshiro128p_uniform_float32(rng_states, thread_id)
        out[thread_id] = x

    rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)
    out = np.zeros(threads_per_block * blocks, dtype=np.float32)
    rand_array[blocks, threads_per_block](rng_states, out)
    return out.reshape(blocks,threads_per_block)

# Example of usage: 
matrix100x100 = random(100, 100)

在 Nvidia GTX-650 中测量性能

库达:

%timeit random(100, 100)

每个循环 613 毫秒 ± 2.6 毫秒(7 次运行的平均值 ± 标准偏差,每次 1 个循环)

麻木的:

%timeit np.random.rand(100, 100)

每个循环 19.1 毫秒 ± 353 微秒(7 次运行的平均值 ± 标准偏差,每次 100 次循环)