从 ys 的列中采样堆叠在 R 中的 x 值上（视觉提供）

Question

背景

我有两个变量，分别叫做 x 和 y（请看图片下方的 R 代码）。当我 plot(x, y) 时，我获得了顶行图（见下文）。 y 值堆叠在每个 x 值的顶部。

问题

我想知道为什么当我从 y 值中采样时，这些值分别堆叠在每个 x 值的顶部（例如，y 值堆叠在 x 值“0”的顶部")，我得到了一些超出其母样本范围的 y 样本值！？（请查看底行table以查看此）。

这是我的 R 代码：

 #############  Input Values ###################
                      each.sub.pop.n = 150; 
                      sub.pop.means = 20:10; 
                      predict.range = 0:10; 
                      sub.pop.sd = .75;
                      n.sample = 2;
 #############################################
  par( mar = c(2, 4.1, 2.1, 2.1) )

  m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)

  Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')

  y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )

  x <- rep(predict.range, each = each.sub.pop.n)

  plot(x, y)


  ## Unsuccessfull Sampling ##
  x <- rep(predict.range, each = n.sample)

  y <- sample(y , length(x), replace = TRUE)

  plot(x, y)

Answer 1

在我看来，你的样本在你不成功的样本中并不以 x 为条件。在下面，我将 y 数据除以 x，然后从每个样本中抽取两个案例。结果似乎有效。

sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample), 
                     x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y)

Answer 2

您可以将 sampling 包中实现的分层抽样与 strata 函数一起使用：

  par( mar = c(2, 4.1, 2.1, 2.1) )
  m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
  Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
  y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
  x <- rep(predict.range, each = each.sub.pop.n)
  plot(x, y)

  library(sampling)
  df <- data.frame(x,y)
  set.seed(123)
  stratif_sampl <- strata(df,"x",rep(2,11))
  idx <- stratif_sampl$ID_unit
  plot(x[idx], y[idx])

从 ys 的列中采样堆叠在 R 中的 x 值上（视觉提供）

Sampling from columns of ys stacked over values of x in R (visual provided)

statistics

r

sampling

resampling

背景

问题

这是我的 R 代码：