如何在单个 R 函数中模拟数据和可视化

Question

我正在使用 replicate 来模拟 R 中的分布，并可视化它们如何随不同的参数变化（例如，rbinom(100,1,0.5) 与 rbinom(100,1,0.01)）。

我想在一个函数中完成所有这些操作：1. 模拟重复，2. 设置绘图尺寸和参数，以及 3. 循环并绘制密度曲线。

在不同的部分，这段代码没有问题：

n <- 100
d <- as.data.frame(
    replicate(n, 
              expr = rbinom(n, 1, 0.5), 
              simplify = F)
)
colnames(d) <- 1:n
plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,2)) 
for(i in 1:n) lines( density( d[,i]) )

但是在函数内部，只返回一条密度曲线：

plotcurves <- function(n, distr, ymax) {

    d <- as.data.frame(
        replicate(n, 
                  expr = distr, 
                  simplify = F)
    )
    colnames(d) <- 1:n
    plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,ymax)) 
    for(i in 1:n) lines( density( d[,i]) )
}

plotcurves(n = 100, distr = rbinom(100, 1, 0.5), ymax = 2)

解决方案似乎很简单，但我似乎找不到。我需要做什么来修复代码，或者我不知道这样的功能是否已经存在？

Answer 1

问题在于，在您的函数中，distr 在调用 replicate 之前被求值。如果您对函数进行变体，仅 returns 数据框 d 而不是绘制它，则可以看到这一点：

show_d <- function(n, distr, ymax) 
{
    d <- as.data.frame(
        replicate(n, 
                  expr = distr, 
                  simplify = F)
    )
  return(d)
}

show_d(n = 3, distr = rbinom(5, 1, 0.5), ymax = 2)
#>   c.1L..0L..1L..1L..1L. c.1L..0L..1L..1L..1L..1 c.1L..0L..1L..1L..1L..2
#> 1                     1                       1                       1
#> 2                     0                       0                       0
#> 3                     1                       1                       1
#> 4                     1                       1                       1
#> 5                     1                       1                       1

您会发现所有列都是一样的。实际上，对 rbinom 的调用被评估 ，然后 传递给 replicate，这与调用 replicate(3, c(1, 0, 1, 1, 1)) 相同。所以你是绘制所有的线 - 只是这些线都是一样的。

您需要在函数内部做的是确保 distr 作为调用传递给 replicate，而不是作为评估和发送一个向量。您可以使用 match.call() 并提取第三个元素（即第二个参数）：

plotcurves <- function(n, distr, ymax) {
    mc <- match.call()[[3]]
    d <- as.data.frame(
        replicate(n, 
                  expr = mc,
                  simplify = F)
    )
    colnames(d) <- 1:n
    plot( NULL, xlim = c( min(d)-0.5, max(d)+0.5), ylim = c(0,ymax)) 
    for(i in 1:n) lines( density( d[,i]) )
}

plotcurves(n = 100, distr = rbinom(100, 1, 0.5), ymax = 2)

如何在单个 R 函数中模拟数据和可视化

How to simulate data and visualize in a single R function

simulation

r

data-visualization

function

distribution