随机矩阵所有行的快速随机加权选择
Fast random weighted selection across all rows of a stochastic matrix
numpy.random.choice
允许从向量中进行加权选择,即
arr = numpy.array([1, 2, 3])
weights = numpy.array([0.2, 0.5, 0.3])
choice = numpy.random.choice(arr, p=weights)
选择 1 的概率为 0.2,选择 2 的概率为 0.5,选择 3 的概率为 0.3。
如果我们想以向量化的方式快速为二维数组(矩阵)执行此操作,其中每一行都是概率向量怎么办?也就是说,我们想要一个来自随机矩阵的选择向量?这是超级慢的方式:
import numpy as np
m = 10
n = 100 # Or some very large number
items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)
choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
choices[i] = np.random.choice(items, p=prob_matrix[:,i])
print(choices)
:
array([ 4., 7., 8., 1., 0., 4., 3., 7., 1., 5., 7., 5., 3.,
1., 9., 1., 1., 5., 9., 8., 2., 3., 2., 6., 4., 3.,
8., 4., 1., 1., 4., 0., 1., 8., 5., 3., 9., 9., 6.,
5., 4., 8., 4., 2., 4., 0., 3., 1., 2., 5., 9., 3.,
9., 9., 7., 9., 3., 9., 4., 8., 8., 7., 6., 4., 6.,
7., 9., 5., 0., 6., 1., 3., 3., 2., 4., 7., 0., 6.,
3., 5., 8., 0., 8., 3., 4., 5., 2., 2., 1., 1., 9.,
9., 4., 3., 3., 2., 8., 0., 6., 1.])
This post suggests that cumsum
and bisect
could be a potential approach, and is fast. But while numpy.cumsum(arr, axis=1)
can do this along one axis of a numpy array, the bisect.bisect
function only works on a single array at a time. Similarly, numpy.searchsorted
也仅适用于一维数组。
是否有仅使用向量化运算的快速方法?
我认为不可能将其完全矢量化,但您仍然可以通过尽可能多地矢量化来获得不错的加速。这是我想出的:
def improved(prob_matrix, items):
# transpose here for better data locality later
cdf = np.cumsum(prob_matrix.T, axis=1)
# random numbers are expensive, so we'll get all of them at once
ridx = np.random.random(size=n)
# the one loop we can't avoid, made as simple as possible
idx = np.zeros(n, dtype=int)
for i, r in enumerate(ridx):
idx[i] = np.searchsorted(cdf[i], r)
# fancy indexing all at once is faster than indexing in a loop
return items[idx]
针对问题中的版本进行测试:
def original(prob_matrix, items):
choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
choices[i] = np.random.choice(items, p=prob_matrix[:,i])
return choices
这是加速(使用问题中给出的设置代码):
In [45]: %timeit original(prob_matrix, items)
100 loops, best of 3: 2.86 ms per loop
In [46]: %timeit improved(prob_matrix, items)
The slowest run took 4.15 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 157 µs per loop
我不确定为什么我的版本在时间上存在很大差异,但即使是最慢的 运行(~650 微秒)仍然快了将近 5 倍。
这是一个非常快的完全矢量化版本:
def vectorized(prob_matrix, items):
s = prob_matrix.cumsum(axis=0)
r = np.random.rand(prob_matrix.shape[1])
k = (s < r).sum(axis=0)
return items[k]
理论上,searchsorted
是用来查找累积求和概率中的随机值的正确函数,但是m
相对小,k = (s < r).sum(axis=0)
最终会快得多。它的时间复杂度是 O(m),而 searchsorted
方法是 O(log(m)),但这只会影响更大的 m
。 也,cumsum
是O(m),所以vectorized
和@perimosocordiae的improved
都是O(m)。 (如果您的 m
实际上要大得多,则您必须 运行 进行一些测试以了解 m
可以有多大,然后此方法才会变慢。)
这是我用 m = 10
和 n = 10000
得到的时间(使用来自@perimosocordiae 的回答的函数 original
和 improved
):
In [115]: %timeit original(prob_matrix, items)
1 loops, best of 3: 270 ms per loop
In [116]: %timeit improved(prob_matrix, items)
10 loops, best of 3: 24.9 ms per loop
In [117]: %timeit vectorized(prob_matrix, items)
1000 loops, best of 3: 1 ms per loop
定义函数的完整脚本是:
import numpy as np
def improved(prob_matrix, items):
# transpose here for better data locality later
cdf = np.cumsum(prob_matrix.T, axis=1)
# random numbers are expensive, so we'll get all of them at once
ridx = np.random.random(size=n)
# the one loop we can't avoid, made as simple as possible
idx = np.zeros(n, dtype=int)
for i, r in enumerate(ridx):
idx[i] = np.searchsorted(cdf[i], r)
# fancy indexing all at once is faster than indexing in a loop
return items[idx]
def original(prob_matrix, items):
choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
choices[i] = np.random.choice(items, p=prob_matrix[:,i])
return choices
def vectorized(prob_matrix, items):
s = prob_matrix.cumsum(axis=0)
r = np.random.rand(prob_matrix.shape[1])
k = (s < r).sum(axis=0)
return items[k]
m = 10
n = 10000 # Or some very large number
items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)
numpy.random.choice
允许从向量中进行加权选择,即
arr = numpy.array([1, 2, 3])
weights = numpy.array([0.2, 0.5, 0.3])
choice = numpy.random.choice(arr, p=weights)
选择 1 的概率为 0.2,选择 2 的概率为 0.5,选择 3 的概率为 0.3。
如果我们想以向量化的方式快速为二维数组(矩阵)执行此操作,其中每一行都是概率向量怎么办?也就是说,我们想要一个来自随机矩阵的选择向量?这是超级慢的方式:
import numpy as np
m = 10
n = 100 # Or some very large number
items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)
choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
choices[i] = np.random.choice(items, p=prob_matrix[:,i])
print(choices)
:
array([ 4., 7., 8., 1., 0., 4., 3., 7., 1., 5., 7., 5., 3.,
1., 9., 1., 1., 5., 9., 8., 2., 3., 2., 6., 4., 3.,
8., 4., 1., 1., 4., 0., 1., 8., 5., 3., 9., 9., 6.,
5., 4., 8., 4., 2., 4., 0., 3., 1., 2., 5., 9., 3.,
9., 9., 7., 9., 3., 9., 4., 8., 8., 7., 6., 4., 6.,
7., 9., 5., 0., 6., 1., 3., 3., 2., 4., 7., 0., 6.,
3., 5., 8., 0., 8., 3., 4., 5., 2., 2., 1., 1., 9.,
9., 4., 3., 3., 2., 8., 0., 6., 1.])
This post suggests that cumsum
and bisect
could be a potential approach, and is fast. But while numpy.cumsum(arr, axis=1)
can do this along one axis of a numpy array, the bisect.bisect
function only works on a single array at a time. Similarly, numpy.searchsorted
也仅适用于一维数组。
是否有仅使用向量化运算的快速方法?
我认为不可能将其完全矢量化,但您仍然可以通过尽可能多地矢量化来获得不错的加速。这是我想出的:
def improved(prob_matrix, items):
# transpose here for better data locality later
cdf = np.cumsum(prob_matrix.T, axis=1)
# random numbers are expensive, so we'll get all of them at once
ridx = np.random.random(size=n)
# the one loop we can't avoid, made as simple as possible
idx = np.zeros(n, dtype=int)
for i, r in enumerate(ridx):
idx[i] = np.searchsorted(cdf[i], r)
# fancy indexing all at once is faster than indexing in a loop
return items[idx]
针对问题中的版本进行测试:
def original(prob_matrix, items):
choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
choices[i] = np.random.choice(items, p=prob_matrix[:,i])
return choices
这是加速(使用问题中给出的设置代码):
In [45]: %timeit original(prob_matrix, items)
100 loops, best of 3: 2.86 ms per loop
In [46]: %timeit improved(prob_matrix, items)
The slowest run took 4.15 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 157 µs per loop
我不确定为什么我的版本在时间上存在很大差异,但即使是最慢的 运行(~650 微秒)仍然快了将近 5 倍。
这是一个非常快的完全矢量化版本:
def vectorized(prob_matrix, items):
s = prob_matrix.cumsum(axis=0)
r = np.random.rand(prob_matrix.shape[1])
k = (s < r).sum(axis=0)
return items[k]
理论上,searchsorted
是用来查找累积求和概率中的随机值的正确函数,但是m
相对小,k = (s < r).sum(axis=0)
最终会快得多。它的时间复杂度是 O(m),而 searchsorted
方法是 O(log(m)),但这只会影响更大的 m
。 也,cumsum
是O(m),所以vectorized
和@perimosocordiae的improved
都是O(m)。 (如果您的 m
实际上要大得多,则您必须 运行 进行一些测试以了解 m
可以有多大,然后此方法才会变慢。)
这是我用 m = 10
和 n = 10000
得到的时间(使用来自@perimosocordiae 的回答的函数 original
和 improved
):
In [115]: %timeit original(prob_matrix, items)
1 loops, best of 3: 270 ms per loop
In [116]: %timeit improved(prob_matrix, items)
10 loops, best of 3: 24.9 ms per loop
In [117]: %timeit vectorized(prob_matrix, items)
1000 loops, best of 3: 1 ms per loop
定义函数的完整脚本是:
import numpy as np
def improved(prob_matrix, items):
# transpose here for better data locality later
cdf = np.cumsum(prob_matrix.T, axis=1)
# random numbers are expensive, so we'll get all of them at once
ridx = np.random.random(size=n)
# the one loop we can't avoid, made as simple as possible
idx = np.zeros(n, dtype=int)
for i, r in enumerate(ridx):
idx[i] = np.searchsorted(cdf[i], r)
# fancy indexing all at once is faster than indexing in a loop
return items[idx]
def original(prob_matrix, items):
choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
choices[i] = np.random.choice(items, p=prob_matrix[:,i])
return choices
def vectorized(prob_matrix, items):
s = prob_matrix.cumsum(axis=0)
r = np.random.rand(prob_matrix.shape[1])
k = (s < r).sum(axis=0)
return items[k]
m = 10
n = 10000 # Or some very large number
items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)