如何一次计算所有每个 numpy 值的概率?
How can I calculate probability for all each numpy value at once?
我有一个计算概率的函数,如下所示:
def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
k = len(x)
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = math.sqrt(((2*math.pi)**k)*det)
numerator = np.dot((x - mean).transpose(), inv)
numerator = np.dot(numerator, (x - mean))
numerator = math.exp(-0.5 * numerator)
return numerator/denominator
我有用于测试的均值向量、协方差矩阵和 2D numpy 数组
mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
[124, 150, 44],
[11, 44, 130]])
arr = np.array([[42, 234, 124], # arr is 43923794 x 3 matrix
[123, 222, 112],
[42, 213, 11],
...(so many values about 40,000,000 rows),
[23, 55, 251]])
我必须计算每个值的概率,所以我使用了这段代码
for i in arr:
print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix
但是太慢了...
有没有更快的计算概率的方法?
或者有什么方法可以像 'batch'?
一样立即计算测试 arr 的概率
你可以试试numba。只需用 @numba.vectorize
.
装饰你的函数
@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability
new_arr = multinormpdf(arr)
如果您的multinormpdf
不包含任何不受支持的功能,则可以加速。看这里:https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
此外,您可以像这样使用实验性功能target='parallel'
。
@numba.vectorize(target='parallel')
您可以轻松地向量化您的函数:
import numpy as np
def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator
arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])
mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]
slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)
np.allclose(slow_out, fast_out) # True
fast_multinormpdf
比未向量化的函数快大约 1000 倍:
long_arr = np.tile(arr, (10000, 1))
%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
我有一个计算概率的函数,如下所示:
def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
k = len(x)
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = math.sqrt(((2*math.pi)**k)*det)
numerator = np.dot((x - mean).transpose(), inv)
numerator = np.dot(numerator, (x - mean))
numerator = math.exp(-0.5 * numerator)
return numerator/denominator
我有用于测试的均值向量、协方差矩阵和 2D numpy 数组
mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
[124, 150, 44],
[11, 44, 130]])
arr = np.array([[42, 234, 124], # arr is 43923794 x 3 matrix
[123, 222, 112],
[42, 213, 11],
...(so many values about 40,000,000 rows),
[23, 55, 251]])
我必须计算每个值的概率,所以我使用了这段代码
for i in arr:
print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix
但是太慢了...
有没有更快的计算概率的方法? 或者有什么方法可以像 'batch'?
一样立即计算测试 arr 的概率你可以试试numba。只需用 @numba.vectorize
.
@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability
new_arr = multinormpdf(arr)
如果您的multinormpdf
不包含任何不受支持的功能,则可以加速。看这里:https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
此外,您可以像这样使用实验性功能target='parallel'
。
@numba.vectorize(target='parallel')
您可以轻松地向量化您的函数:
import numpy as np
def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator
arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])
mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]
slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)
np.allclose(slow_out, fast_out) # True
fast_multinormpdf
比未向量化的函数快大约 1000 倍:
long_arr = np.tile(arr, (10000, 1))
%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)