Seaborn Kde 绘制出意想不到的结果
Seaborn Kde plots unexpected results
我创建了一个简单的 seaborn kde 图,想知道这是否是一个错误。
我的代码是:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.kdeplot(np.array([1,2]), cmap="Reds", shade=True, bw=0.01)
sns.kdeplot(np.array([2.4,2.5]), cmap="Blues", shade=True, bw=0.01)
plt.show()
蓝线和红线表示 2 点的 kde。如果点靠得很近,则与距离较远的点相比,密度要窄得多。我发现这非常违反直觉,至少在可以看到的范围内。我想知道这是否是一个错误。我也找不到描述如何从一组给定点计算密度的资源。感谢任何帮助。
bw_method=
(在旧版本中称为bw=
)直接传递给scipy.stats.gaussian_kde. The docs there write "If a scalar, this will be used directly as kde.factor
". The explanation of kde.factor
tells "The square of kde.factor
multiplies the covariance matrix of the data in the kde estimation." So, it is a kind of scaling factor. If still more details are needed, you could dive into scipy's source code,或进入文档中引用的研究论文。
如果你真的想对抗缩放,你可以把它分开:sns.kdeplot(np.array(data), ..., bw_method=0.01/np.std(data))
。
或者您可以创建自己的 gaussian kde 版本,在数据坐标中使用带宽。它只是对一些 gauss curves 求和并通过除以曲线数来归一化(曲线下的总面积应为 1)。
这是一些示例代码,其中包含 1、2 或 20 个输入点的 kde 曲线:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
def gauss(x, mu=0.0, sigma=1.0):
return np.exp(-((x - mu) / sigma) ** 2 / 2) / (sigma * np.sqrt(2 * np.pi))
def kde(xs, data, sigma=1.0):
return gauss(xs.reshape(-1, 1), data.reshape(1, -1), sigma).sum(axis=1) / len(data)
sns.set()
sigma = 0.03
xs = np.linspace(0, 4, 300)
fig, ax = plt.subplots(figsize=(12, 5))
data1 = np.array([1, 2])
kde1 = kde(xs, data1, sigma=sigma)
ax.plot(xs, kde1, color='crimson', label=f'dist of 1, σ={sigma}')
ax.fill_between(xs, kde1, color='crimson', alpha=0.3)
data2 = np.array([2.4, 2.5])
kde2 = kde(xs, data2, sigma=sigma)
ax.plot(xs, kde2, color='dodgerblue', label=f'dist of 0.1, σ={sigma}')
ax.fill_between(xs, kde2, color='dodgerblue', alpha=0.3)
data3 = np.array([3])
kde3 = kde(xs, data3, sigma=sigma)
ax.plot(xs, kde3, color='limegreen', label=f'1 point, σ={sigma}')
ax.fill_between(xs, kde3, color='limegreen', alpha=0.3)
data4 = np.random.normal(0.01, 0.1, 20).cumsum() + 1.1
kde4 = kde(xs, data4, sigma=sigma)
ax.plot(xs, kde4, color='purple', label=f'20 points, σ={sigma}')
ax.fill_between(xs, kde4, color='purple', alpha=0.3)
ax.margins(x=0) # remove superfluous whitespace left and right
ax.set_ylim(ymin=0) # let the plot "sit" onto y=0
ax.legend()
plt.show()
我创建了一个简单的 seaborn kde 图,想知道这是否是一个错误。
我的代码是:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.kdeplot(np.array([1,2]), cmap="Reds", shade=True, bw=0.01)
sns.kdeplot(np.array([2.4,2.5]), cmap="Blues", shade=True, bw=0.01)
plt.show()
蓝线和红线表示 2 点的 kde。如果点靠得很近,则与距离较远的点相比,密度要窄得多。我发现这非常违反直觉,至少在可以看到的范围内。我想知道这是否是一个错误。我也找不到描述如何从一组给定点计算密度的资源。感谢任何帮助。
bw_method=
(在旧版本中称为bw=
)直接传递给scipy.stats.gaussian_kde. The docs there write "If a scalar, this will be used directly as kde.factor
". The explanation of kde.factor
tells "The square of kde.factor
multiplies the covariance matrix of the data in the kde estimation." So, it is a kind of scaling factor. If still more details are needed, you could dive into scipy's source code,或进入文档中引用的研究论文。
如果你真的想对抗缩放,你可以把它分开:sns.kdeplot(np.array(data), ..., bw_method=0.01/np.std(data))
。
或者您可以创建自己的 gaussian kde 版本,在数据坐标中使用带宽。它只是对一些 gauss curves 求和并通过除以曲线数来归一化(曲线下的总面积应为 1)。
这是一些示例代码,其中包含 1、2 或 20 个输入点的 kde 曲线:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
def gauss(x, mu=0.0, sigma=1.0):
return np.exp(-((x - mu) / sigma) ** 2 / 2) / (sigma * np.sqrt(2 * np.pi))
def kde(xs, data, sigma=1.0):
return gauss(xs.reshape(-1, 1), data.reshape(1, -1), sigma).sum(axis=1) / len(data)
sns.set()
sigma = 0.03
xs = np.linspace(0, 4, 300)
fig, ax = plt.subplots(figsize=(12, 5))
data1 = np.array([1, 2])
kde1 = kde(xs, data1, sigma=sigma)
ax.plot(xs, kde1, color='crimson', label=f'dist of 1, σ={sigma}')
ax.fill_between(xs, kde1, color='crimson', alpha=0.3)
data2 = np.array([2.4, 2.5])
kde2 = kde(xs, data2, sigma=sigma)
ax.plot(xs, kde2, color='dodgerblue', label=f'dist of 0.1, σ={sigma}')
ax.fill_between(xs, kde2, color='dodgerblue', alpha=0.3)
data3 = np.array([3])
kde3 = kde(xs, data3, sigma=sigma)
ax.plot(xs, kde3, color='limegreen', label=f'1 point, σ={sigma}')
ax.fill_between(xs, kde3, color='limegreen', alpha=0.3)
data4 = np.random.normal(0.01, 0.1, 20).cumsum() + 1.1
kde4 = kde(xs, data4, sigma=sigma)
ax.plot(xs, kde4, color='purple', label=f'20 points, σ={sigma}')
ax.fill_between(xs, kde4, color='purple', alpha=0.3)
ax.margins(x=0) # remove superfluous whitespace left and right
ax.set_ylim(ymin=0) # let the plot "sit" onto y=0
ax.legend()
plt.show()