Python - 在等高线内集成 2D 核密度估计

Python - integrate 2D Kernel Density Estimation within contour lines

我想绘制核密度估计的等高线图,其中 KDE 集成在每个等高线图填充区域中。

举个例子,假设我计算二维数据的 KDE:

data = np.random.multivariate_normal((0, 0), [[1, 1], [2, 0.7]], 100)
x = data[:, 0]
y = data[:, 1]
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)


fig = plt.figure()
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
cfset = ax.contourf(xx, yy, f, cmap='Blues')
cset = ax.contour(xx, yy, f, colors='k')


请注意,只有当您的等高线是 'monotonic' 时,以下内容才是正确的,即在等高线内,您只能找到高于相应等高线水平的像素值。另请注意,如果您的密度是多峰的,则单独峰中的相应区域会集中在一起。

如果这是 true/acceptable 您的问题可以通过按值排序像素来解决。

我不知道您的绘图程序通过哪种启发式方法选择其等高线水平,但假设您将它们存储(例如,按升序)在一个名为 'levels' 的变量中,您可以尝试类似

ff = f.ravel()
order = np.argsort(ff)
fsorted = ff[order]
F = np.cumsum(fsorted)
# depending on how your density is normalised next line may be superfluous
# also note that this is only correct for equal bins
# and, finally, to be unimpeachably rigorous, this disregards the probability
# mass outside the field of view, so it calculates probability condtional
# on being in the field of view
F /= F[-1]
boundaries = fsorted.searchsorted(levels)
new_levels = F[boundaries]

现在,为了能够使用它,您的绘图程序必须允许您自由选择等高线标签或至少选择放置等高线的级别。在后一种情况下,假设有一个 kwarg 'levels'

# make a copy to avoid problems with in-place shuffling
# i.e. overwriting positions whose original values are still to be read out
F[order] = F.copy()
F.shape = f.shape
cset = ax.contour(xx, yy, F, levels=new_levels, colors='k')


Finally, if one wants to really have the probability within each filled area, this is a workaround that works: cb = fig.colorbar(cfset, ax = ax) values = cb.values.copy() values[1:] -= values[:-1].copy() cb.set_ticklabels(values) – Laura