sklearn KernelDensity score_samples 给出的值大于 0
sklearn KernelDensity score_samples giving values greater than 0
我正在使用 sklearn
KernelDensity
函数来估计密度,然后使用 score_samples
函数在某些点评估 pdf,但是 score_samples
函数返回的值很多大于 0 不应该是这种情况,因为根据 documentation 它 returns log(density)
[Documentation: 数组日志(密度)评估。这些被归一化为概率密度,因此高维数据的值会很低。]
from sklearn.neighbors.kde import KernelDensity
import numpy as np
data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)
#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])
由于密度位于 [0, 1],log(density)
应该在 (-Inf, 0]
之间,这与上面显示的 19.9448
不同。
概率密度不必介于 [0,1] 之间。它们是密度而不是精确的概率。维基百科页面很好地概述了 pdf。
我正在使用 sklearn
KernelDensity
函数来估计密度,然后使用 score_samples
函数在某些点评估 pdf,但是 score_samples
函数返回的值很多大于 0 不应该是这种情况,因为根据 documentation 它 returns log(density)
[Documentation: 数组日志(密度)评估。这些被归一化为概率密度,因此高维数据的值会很低。]
from sklearn.neighbors.kde import KernelDensity
import numpy as np
data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)
#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])
由于密度位于 [0, 1],log(density)
应该在 (-Inf, 0]
之间,这与上面显示的 19.9448
不同。
概率密度不必介于 [0,1] 之间。它们是密度而不是精确的概率。维基百科页面很好地概述了 pdf。