Python/Scipy 哪里合适，缩放

Question

我在 Python 中有一个系列，我想为其直方图拟合一个密度。问题：是否有巧妙的方法使用 np.histogram() 中的值来实现此结果？ （见下方更新）

我目前的问题是我执行的 kde 拟合有（看似）不需要的扭结，如下面的第二个图所示。我希望 kde 适合基于直方图单调递减，这是第一个描绘的数字。下面我包含了我当前的代码。提前致谢

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import gaussian_kde as kde

df[var].hist()
plt.show()  # shows the original histogram
density = kde(df[var])
xs = np.arange(0, df[var].max(), 0.1)
ys = density(xs)
plt.plot(xs, ys)  # a pdf with kinks

或者，有没有更巧妙的使用方法

count, div = np.histogram(df[var])

然后缩放计数数组以对其应用 kde()？

更新

根据下面 cel 的评论（应该很明显，但我错过了！），在这种情况下，我使用 pandas.DataFrame.hist() 中的默认参数隐式地进行了 under-binning。在我使用的更新情节中

df[var].hist(bins=100)

我会把这个 post 保留下来，以防其他人觉得它有用，但我不会介意它被删除为 'too localized' 等

Answer 1

如果您使用 bw_method 参数增加带宽，那么 kde 看起来会更流畅。本例来自Justin Peel's answer；代码已被修改以利用 bw_method:

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density1 = gaussian_kde(data)
bandwidth = 1.5
density2 = gaussian_kde(data, bw_method=bandwidth)
xs = np.linspace(0,8,200)
plt.plot(xs,density1(xs), label='bw_method=None')
plt.plot(xs,density2(xs), label='bw_method={}'.format(bandwidth))
plt.legend(loc='best')
plt.show()

产量

Answer 2

问题是 cel 提到的 under-binning，见上面的评论。明确在 pd.DataFrame.histo() 中设置 bins=100，默认为 bins=10.

另请参阅： http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width

Python/Scipy 哪里合适，缩放

Python/Scipy kde fit, scaling

python

numpy

histogram

kernel-density

pandas

更新