Seaborn Kde 绘制出意想不到的结果

Seaborn Kde plots unexpected results

我创建了一个简单的 seaborn kde 图,想知道这是否是一个错误。

我的代码是:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

sns.kdeplot(np.array([1,2]), cmap="Reds",  shade=True,  bw=0.01)
sns.kdeplot(np.array([2.4,2.5]), cmap="Blues", shade=True,  bw=0.01)
plt.show()

蓝线和红线表示 2 点的 kde。如果点靠得很近,则与距离较远的点相比,密度要窄得多。我发现这非常违反直觉,至少在可以看到的范围内。我想知道这是否是一个错误。我也找不到描述如何从一组给定点计算密度的资源。感谢任何帮助。

bw_method=(在旧版本中称为bw=)直接传递给scipy.stats.gaussian_kde. The docs there write "If a scalar, this will be used directly as kde.factor". The explanation of kde.factor tells "The square of kde.factor multiplies the covariance matrix of the data in the kde estimation." So, it is a kind of scaling factor. If still more details are needed, you could dive into scipy's source code,或进入文档中引用的研究论文。

如果你真的想对抗缩放,你可以把它分开:sns.kdeplot(np.array(data), ..., bw_method=0.01/np.std(data))

或者您可以创建自己的 gaussian kde 版本,在数据坐标中使用带宽。它只是对一些 gauss curves 求和并通过除以曲线数来归一化(曲线下的总面积应为 1)。

这是一些示例代码,其中包含 1、2 或 20 个输入点的 kde 曲线:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def gauss(x, mu=0.0, sigma=1.0):
    return np.exp(-((x - mu) / sigma) ** 2 / 2) / (sigma * np.sqrt(2 * np.pi))

def kde(xs, data, sigma=1.0):
    return gauss(xs.reshape(-1, 1), data.reshape(1, -1), sigma).sum(axis=1) / len(data)

sns.set()
sigma = 0.03
xs = np.linspace(0, 4, 300)
fig, ax = plt.subplots(figsize=(12, 5))

data1 = np.array([1, 2])
kde1 = kde(xs, data1, sigma=sigma)
ax.plot(xs, kde1, color='crimson', label=f'dist of 1, σ={sigma}')
ax.fill_between(xs, kde1, color='crimson', alpha=0.3)

data2 = np.array([2.4, 2.5])
kde2 = kde(xs, data2, sigma=sigma)
ax.plot(xs, kde2, color='dodgerblue', label=f'dist of 0.1, σ={sigma}')
ax.fill_between(xs, kde2, color='dodgerblue', alpha=0.3)

data3 = np.array([3])
kde3 = kde(xs, data3, sigma=sigma)
ax.plot(xs, kde3, color='limegreen', label=f'1 point, σ={sigma}')
ax.fill_between(xs, kde3, color='limegreen', alpha=0.3)

data4 = np.random.normal(0.01, 0.1, 20).cumsum() + 1.1
kde4 = kde(xs, data4, sigma=sigma)
ax.plot(xs, kde4, color='purple', label=f'20 points, σ={sigma}')
ax.fill_between(xs, kde4, color='purple', alpha=0.3)

ax.margins(x=0)  # remove superfluous whitespace left and right
ax.set_ylim(ymin=0)  # let the plot "sit" onto y=0
ax.legend()
plt.show()