使用 numpy 创建数字列表的概率分布的有效方法

Question

这是我正在尝试做的一个例子。假设以下 numpy 数组：

A = np.array([3, 0, 1, 5, 7]) # in practice, this array is a huge array of float numbers: A.shape[0] >= 1000000

我需要最快的方法来获得以下结果：

result = []

for a in A:
    result.append( 1 / np.exp(A - a).sum() )

result = np.array(result)

print(result)

>>> [1.58297157e-02 7.88115138e-04 2.14231906e-03 1.16966657e-01 8.64273193e-01]

选项 1（比以前的代码更快）：

result = 1 / np.exp(A - A[:,None]).sum(axis=1)

print(result)

>>> [1.58297157e-02 7.88115138e-04 2.14231906e-03 1.16966657e-01 8.64273193e-01]

有没有更快的方法获得“结果”？

编辑：是的，scipy.special.softmax 成功了

Answer 1

而不是尝试通过就地归一化来计算每个值（有效地将所有值相加，对每个值重复），而只是获取指数然后归一化一次在末尾。所以：

raw = np.exp(A)
result = A / sum(A)

（在我的测试中，内置 sum 的速度是 np.sum 的 2.5 倍以上，用于求和一个小数组。我没有用更大的数组进行测试。）

Answer 2

是：scipy.special.softmax成功了

from scipy.special import softmax

result = softmax(A)

谢谢@j1-lee 和@Karl Knechtel

使用 numpy 创建数字列表的概率分布的有效方法

Efficient way to create the probability distribution of a list of numbers with numpy

python

numpy

probability-distribution