从具有可变底层网格的核密度估计器进行模拟

Question

我有一个数据集，用于通过估计核密度来创建经验概率分布。现在我正在使用 R 的 kde2d from the MASS package. After estimating the probability distribution, I use sample to sample from slices of the 2D distribution along the x-axis. I use sample much like described 。示例代码如下所示

library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)

den 看起来像这样

我的数据有已知个波动很大的区域，需要精细的网格粒度。其他地区基本上没有数据点，那里什么也没有发生。如果我可以将 kde2d 的 n 参数设置为非常高的数字，以便在任何地方都能很好地解析我的数据，我会很好。 las，由于内存限制，这是不可能的。

这就是为什么我认为我可以修改 kde2d 函数以具有非常量粒度的原因。
Here是kde2d函数的源代码。可以修改行

gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])

并在 y 轴上放置所需的任何粒度。例如

a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))

并修改kde2d returns指定位置的核密度估计。效果很好。假设我现在

问题是，我不能再使用sample沿x轴从切片中采样。因为分布左侧的部分更精细，因此被 sample 采样的概率更高。

我该怎么做才能在我需要的地方有一个精细的网格，但要根据其适当的密度从分布中采样？非常感谢。

Answer 1

在 conditional_probabilty_density 上使用 approx 和新的 n。

从具有可变底层网格的核密度估计器进行模拟

Simulate from kernel density estimator with variable underlying grid

random

r

kernel-density

probability-density