如何从np直方图计算熵

how to calculate entropy from np histogram

我有一个直方图示例:

mu1 = 10, sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)

并计算

hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
for i in hist1[0]:
    ent = -sum(i * log(abs(i)))
print (ent)

现在我想从给定的直方图数组中找到熵,但是由于 np.histogram returns 两个数组,我在计算熵时遇到了麻烦。我怎样才能调用 np.histogram 的第一个数组并计算熵?即使我上面的代码是正确的,我也会得到熵的数学域错误。 :(

**编辑: 当 Mu = 0 时如何找到熵?和 log(0) 产生数学域错误?


所以我尝试编写的实际代码是:

mu1, sigma1 = 0, 1
mu2, sigma2 = 10, 1
s1 = np.random.normal(mu1, sigma1, 100000)
s2 = np.random.normal(mu2, sigma2, 100000)

hist1 = np.histogram(s1, bins=100, range=(-20,20), density=True)
data1 = hist1[0]
ent1 = -(data1*np.log(np.abs(data1))).sum() 

hist2 = np.histogram(s2, bins=100, range=(-20,20), density=True)
data2 = hist2[0]
ent2 = -(data2*np.log(np.abs(data2))).sum() 

到目前为止,第一个示例 ent1 会产生 nan,第二个 ent2 会产生数学域错误:(

您可以使用向量化代码计算熵:

import numpy as np

mu1 = 10
sigma1 = 10

s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
data = hist1[0]
ent = -(data*np.log(np.abs(data))).sum()
# output: 7.1802159512213191

但是如果你喜欢使用for循环,你可以这样写:

import numpy as np
import math

mu1 = 10
sigma1 = 10

s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
ent = 0
for i in hist1[0]:
    ent -= i * math.log(abs(i))
print (ent)
# output: 7.1802159512213191

使用np.ma.log避免inf和nan错误。 np.ma 是一个屏蔽的 numpy 数组。

因此,为了最终的复制粘贴体验,我只是将现有的两个答案(谢谢大家)合并为一个更全面的 numpy-native 方法。希望对您有所帮助!

def entropy(hist, bit_instead_of_nat=False):
    """
    given a list of positive values as a histogram drawn from any information source,
    returns the entropy of its probability mass function. Usage example:
      hist = [513, 487] # we tossed a coin 1000 times and this is our histogram
      print entropy(hist, True)  # The result is approximately 1 bit
      hist = [-1, 10, 10]; hist = [0] # this kind of things will trigger the warning
    """
    h = np.asarray(hist, dtype=np.float64)
    if h.sum()<=0 or (h<0).any():
        print "[entropy] WARNING, malformed/empty input %s. Returning None."%str(hist)
        return None
    h = h/h.sum()
    log_fn = np.ma.log2 if bit_instead_of_nat else np.ma.log
    return -(h*log_fn(h)).sum()

注意:概率密度函数和概率质量函数在离散直方图上的表现不同,具体取决于 bin 的大小。请参阅 np.histogram 文档字符串:

density : bool, optional

If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

Overrides the normed keyword if given.