tidyverse:用嵌套因子模拟随机样本

tidyverse: Simulating random sample with nested factor

我想用嵌套因子模拟随机样本。因子 Dept 有两个水平 AB。级别 A 有两个嵌套级别 A1A2。级别 B 具有三个嵌套级别 B1B2B3。想要使用一些 R 代码模拟从 2022-01-012022-01-31 的随机样本。下面给出了部分所需的输出(从 2022-01-012022-01-02 仅供参考)。

library(tibble)

set.seed(12345)
df1 <-
  tibble(
    Date   = c(rep("2022-01-01", 5), rep("2022-01-02", 4), rep("2022-01-03", 4))
  , Dept   = c("A", "A", "B", "B", "B", "A", "B", "B", "B", "A", "A", "B", "B")
  , Prog   = c("A1", "A2", "B1", "B2", "B3", "A1", "B1", "B2", "B3", "A1", "A2", "B2", "B3")
  , Amount = runif(n = 13, min = 50000, max = 100000) 
  )

df1
#> # A tibble: 13 x 4
#>    Date       Dept  Prog  Amount
#>    <chr>      <chr> <chr>  <dbl>
#>  1 2022-01-01 A     A1    86045.
#>  2 2022-01-01 A     A2    93789.
#>  3 2022-01-01 B     B1    88049.
#>  4 2022-01-01 B     B2    94306.
#>  5 2022-01-01 B     B3    72824.
#>  6 2022-01-02 A     A1    58319.
#>  7 2022-01-02 B     B1    66255.
#>  8 2022-01-02 B     B2    75461.
#>  9 2022-01-02 B     B3    86385.
#> 10 2022-01-03 A     A1    99487.
#> 11 2022-01-03 A     A2    51727.
#> 12 2022-01-03 B     B2    57619.
#> 13 2022-01-03 B     B3    86784.

如果我们想随机抽样,使用 crossing 创建扩展数据,然后 filter/slice 到 return 每个 'date'

随机行
library(dplyr)
library(tidyr)
library(stringr)
crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), 
   by = "1 day"), Dept = c("A", "B"), Prog = 1:3) %>%
   mutate(Prog = str_c(Dept, Prog)) %>%
  filter(Prog != "A3") %>% 
  mutate(Amount = runif(n = n(), min = 50000, max = 100000)) %>% 
  group_by(Date) %>% 
  slice(seq_len(sample(row_number(), 1)))  %>%
  ungroup

-输出

# A tibble: 102 × 4
   Date       Dept  Prog  Amount
   <date>     <chr> <chr>  <dbl>
 1 2022-01-01 A     A1    83964.
 2 2022-01-01 A     A2    93428.
 3 2022-01-01 B     B1    85187.
 4 2022-01-01 B     B2    79144.
 5 2022-01-01 B     B3    65784.
 6 2022-01-02 A     A1    86014.
 7 2022-01-03 A     A1    76060.
 8 2022-01-03 A     A2    56412.
 9 2022-01-03 B     B1    87365.
10 2022-01-03 B     B2    66169.
# … with 92 more rows