在给定转移矩阵的情况下有效地将转移应用于状态矩阵
Efficiently applying transition to state matrix given transition matrix
我希望将状态更改应用于具有 k 个类别的大型分类矩阵 (M),其中我知道每个类别到 k (T) 中每个其他类别的转移概率
本质上,我希望能够有效地获取 M 中的每个元素,模拟给定 T 中概率的状态变化,并用计算出的变化替换元素。
我已经尝试了一些解决方案:
- 蛮力嵌套索引循环(太长了)
- numba 辅助嵌套 for 循环(~500 毫秒,这对我来说太长了)
- 每个类别和替换的预先计算抽奖(~400 毫秒)
import numpy as np
def categorical_transition(mat, t_mat, k=4):
transformed_mat = mat.copy()
cat_counts = np.bincount(mat.reshape(-1,))
for i in range(k):
rand_vec = np.random.multinomial(1, t_mat[i], cat_counts[i])
choice = np.where(rand_vec)[1]
transformed_mat[mat == i] = choice
return transformed_mat
# load data
mat = np.random.choice(4, (16000, 256))
t_mat = np.random.random((4, 4))
# normalize transition matrix
for i in range(t_mat.shape[0]):
t_mat[i] = t_mat[i] / t_mat[i].sum()
transformed_mat = categorical_transition(mat, t_mat)
此方法有效,但速度很慢,如有任何关于更有效的实施方法的建议,我将不胜感激
这里有一个方法可以让我在这个例子上加速 5 倍
import numpy as np
def categorical_transition(mat, t_mat, k=4):
transformed_mat = mat.copy()
cat_counts = np.bincount(mat.reshape(-1,))
for i in range(k):
rand_vec = np.random.multinomial(1, t_mat[i], cat_counts[i])
choice = np.where(rand_vec)[1]
transformed_mat[mat == i] = choice
return transformed_mat
def pp(mat,t_mat):
ps = t_mat.cumsum(1)
ps /= ps[:,-1:]
return (np.random.random(mat.shape+(1,))<ps[mat]).argmax(-1)
# load data
mat = np.random.choice(4, (16000, 256))
t_mat = np.random.random((4, 4))
# normalize transition matrix
for i in range(t_mat.shape[0]):
t_mat[i] = t_mat[i] / t_mat[i].sum()
transformed_mat = categorical_transition(mat, t_mat)
transformed_mat_pp = pp(mat, t_mat)
# check correctness
from pprint import pprint
np.set_printoptions(3)
cnts = np.bincount(mat.ravel())
pprint([[np.bincount(tm[mat==i])/cnts[i] for tm in (transformed_mat,transformed_mat_pp)] + [t_mat[i]] for i in range(4)])
from timeit import timeit
print('OP',timeit(lambda:categorical_transition(mat, t_mat),number=10)*100,'ms')
print('pp',timeit(lambda:pp(mat, t_mat),number=10)*100,'ms')
样本运行:
[[array([0.186, 0.1 , 0.078, 0.637]),
array([0.186, 0.099, 0.078, 0.637]),
array([0.186, 0.099, 0.078, 0.637])],
[array([0.303, 0.517, 0.088, 0.092]),
array([0.303, 0.517, 0.089, 0.092]),
array([0.303, 0.517, 0.088, 0.092])],
[array([0.319, 0.27 , 0.329, 0.082]),
array([0.319, 0.271, 0.328, 0.083]),
array([0.318, 0.27 , 0.329, 0.082])],
[array([0.408, 0.106, 0.264, 0.222]),
array([0.409, 0.107, 0.263, 0.221]),
array([0.408, 0.107, 0.264, 0.221])]]
OP 872.7993675973266 ms
pp 170.54899749346077 ms
始终提供您迄今为止尝试过的所有实现
我尝试了一个简单的实现,例如 here. 它应该比您的解决方案快大约 20-80 倍,具体取决于您有多少个可用内核。
实施
@nb.njit(parallel=True)
def categorical_transition_nb(mat_in, t_mat):
mat=np.reshape(mat_in,-1)
transformed_mat = np.empty_like(mat)
for i in nb.prange(mat.shape[0]):
rand_number=np.random.rand()
probabilities=t_mat[mat[i],:]
if rand_number<probabilities[0]:
transformed_mat[i]=0
else:
for j in range(1,probabilities.shape[0]):
if rand_number>=probabilities[j-1] and rand_number<probabilities[j]:
transformed_mat[i]=j
return transformed_mat.reshape(mat_in.shape)
时间
import numpy as np
import numba as nb
# load data
mat = np.random.choice(4, (16_000,256))
t_mat = np.random.random((4, 4))
# normalize transition matrix
for i in range(t_mat.shape[0]):
t_mat[i] = t_mat[i] / t_mat[i].sum()
t_mat_2=np.cumsum(t_mat,axis=1)
%timeit transformed_mat_2 = categorical_transition_nb(mat, t_mat_2)
21.7 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
我希望将状态更改应用于具有 k 个类别的大型分类矩阵 (M),其中我知道每个类别到 k (T) 中每个其他类别的转移概率
本质上,我希望能够有效地获取 M 中的每个元素,模拟给定 T 中概率的状态变化,并用计算出的变化替换元素。
我已经尝试了一些解决方案:
- 蛮力嵌套索引循环(太长了)
- numba 辅助嵌套 for 循环(~500 毫秒,这对我来说太长了)
- 每个类别和替换的预先计算抽奖(~400 毫秒)
import numpy as np
def categorical_transition(mat, t_mat, k=4):
transformed_mat = mat.copy()
cat_counts = np.bincount(mat.reshape(-1,))
for i in range(k):
rand_vec = np.random.multinomial(1, t_mat[i], cat_counts[i])
choice = np.where(rand_vec)[1]
transformed_mat[mat == i] = choice
return transformed_mat
# load data
mat = np.random.choice(4, (16000, 256))
t_mat = np.random.random((4, 4))
# normalize transition matrix
for i in range(t_mat.shape[0]):
t_mat[i] = t_mat[i] / t_mat[i].sum()
transformed_mat = categorical_transition(mat, t_mat)
此方法有效,但速度很慢,如有任何关于更有效的实施方法的建议,我将不胜感激
这里有一个方法可以让我在这个例子上加速 5 倍
import numpy as np
def categorical_transition(mat, t_mat, k=4):
transformed_mat = mat.copy()
cat_counts = np.bincount(mat.reshape(-1,))
for i in range(k):
rand_vec = np.random.multinomial(1, t_mat[i], cat_counts[i])
choice = np.where(rand_vec)[1]
transformed_mat[mat == i] = choice
return transformed_mat
def pp(mat,t_mat):
ps = t_mat.cumsum(1)
ps /= ps[:,-1:]
return (np.random.random(mat.shape+(1,))<ps[mat]).argmax(-1)
# load data
mat = np.random.choice(4, (16000, 256))
t_mat = np.random.random((4, 4))
# normalize transition matrix
for i in range(t_mat.shape[0]):
t_mat[i] = t_mat[i] / t_mat[i].sum()
transformed_mat = categorical_transition(mat, t_mat)
transformed_mat_pp = pp(mat, t_mat)
# check correctness
from pprint import pprint
np.set_printoptions(3)
cnts = np.bincount(mat.ravel())
pprint([[np.bincount(tm[mat==i])/cnts[i] for tm in (transformed_mat,transformed_mat_pp)] + [t_mat[i]] for i in range(4)])
from timeit import timeit
print('OP',timeit(lambda:categorical_transition(mat, t_mat),number=10)*100,'ms')
print('pp',timeit(lambda:pp(mat, t_mat),number=10)*100,'ms')
样本运行:
[[array([0.186, 0.1 , 0.078, 0.637]),
array([0.186, 0.099, 0.078, 0.637]),
array([0.186, 0.099, 0.078, 0.637])],
[array([0.303, 0.517, 0.088, 0.092]),
array([0.303, 0.517, 0.089, 0.092]),
array([0.303, 0.517, 0.088, 0.092])],
[array([0.319, 0.27 , 0.329, 0.082]),
array([0.319, 0.271, 0.328, 0.083]),
array([0.318, 0.27 , 0.329, 0.082])],
[array([0.408, 0.106, 0.264, 0.222]),
array([0.409, 0.107, 0.263, 0.221]),
array([0.408, 0.107, 0.264, 0.221])]]
OP 872.7993675973266 ms
pp 170.54899749346077 ms
始终提供您迄今为止尝试过的所有实现
我尝试了一个简单的实现,例如 here. 它应该比您的解决方案快大约 20-80 倍,具体取决于您有多少个可用内核。
实施
@nb.njit(parallel=True)
def categorical_transition_nb(mat_in, t_mat):
mat=np.reshape(mat_in,-1)
transformed_mat = np.empty_like(mat)
for i in nb.prange(mat.shape[0]):
rand_number=np.random.rand()
probabilities=t_mat[mat[i],:]
if rand_number<probabilities[0]:
transformed_mat[i]=0
else:
for j in range(1,probabilities.shape[0]):
if rand_number>=probabilities[j-1] and rand_number<probabilities[j]:
transformed_mat[i]=j
return transformed_mat.reshape(mat_in.shape)
时间
import numpy as np
import numba as nb
# load data
mat = np.random.choice(4, (16_000,256))
t_mat = np.random.random((4, 4))
# normalize transition matrix
for i in range(t_mat.shape[0]):
t_mat[i] = t_mat[i] / t_mat[i].sum()
t_mat_2=np.cumsum(t_mat,axis=1)
%timeit transformed_mat_2 = categorical_transition_nb(mat, t_mat_2)
21.7 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)