在 tidyr 中模拟多个数据集

Question

我希望得到如下所示的整洁数据结构：

N    | r     | data     | stat
---------------------------------
10   | 0.2   | <tibble> | 0.5
20   | 0.3   | <tibble> | 0.86
...

data 是从第一列中的参数生成的，stat 是在 data 上计算的。如果我有前两列，我该如何添加数据集？

作为一个最小的例子，下面是一个创建两个相关列的函数：

correlated_data = function(N, r) {
  MASS::mvrnorm(N, mu=c(0, 4), Sigma=matrix(c(1, r, r, 1), ncol=2))
}

运行这对于 N 和 r 的所有组合，我首先做

# Make parameter combinations
expand.grid(N=c(10,20,30), r=c(0, 0.1, 0.3)) %>%
  group_by(N, r) %>%
  expand(set=1:100) %>%  # create 100 of each combination

  # HERE! How to add a N x 2 tibble to each row?
  rowwise() %>%
  mutate(data=correlate_data( N, r))

  # Compute summary stats on each (for illustration only; not tested)
  mutate(   
     stats = map(data, ~cor.test(.x[, 1], .x[, 2])),  # Correlation on each
     tidy_stats = map(stats, tidy))  # using broom package

我确实有更多参数（N、r、分布），我将计算更多摘要。如果其他工作流程更好，我也欢迎。

Answer 1

这是为两个变量做的：

map2(N, r, correlated_data)

更多变量，使用

pmap(list(N, r), correlated_data)

所以原问题中的完整过程变成：

expand.grid(N=c(10, 20, 30), r=c(0, 0.1, 0.3)) %>%
  group_by(N, r) %>%
  expand(set=1:200) %>%  # create 100 of each combination

  # HERE! How to add a N x 2 tibble to each row?
  mutate(
    data = map2(N, r, correlated_data),
    stats = map(data, ~cor.test(.[, 1], .[,2])),
    tidy_stats = map(stats, tidy)
  ) %>%  # using broom package

  unnest(tidy_stats)

在 tidyr 中模拟多个数据集

Simulate many datasets in tidyr

simulation

r

tidyr

tibble