如何使用 dplyr 在 R 中扩展分组数据?

How can I expand grouped data in R using dplyr?

有没有可能更高效更简洁地实现下面代码的结果?由于我的任务性质,我不能使用 base 或 tidyr 函数,因为 dplyr 代码需要转换为 SQL 并在数据库上执行。

library(dplyr)
library(dbplyr)
library(RSQLite)
library(DBI)

# Create example data set
id <- c("a", "b", "c")
df <- data.frame(id)

# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)

# Expand to data set and create a variable for four quarters
n <- 4

data <- tbl(con, "data") %>%
    mutate(quarter = 1)

for (i in 2:n) {
    data <- data %>%
        mutate(quarter = i) %>%
        union(data, data) %>%
        show_query()
}

data <- collect(data)

我在现实生活中的目标是查询 ID 列表并将其扩展为具有变量 "quarter" 的数据集。我想以那个列表为基础,以后陆续加入更多的信息。

听起来你想要 id = c('a', 'b', 'c')quarters = c(1, 2, 3, 4) 的笛卡尔积,这会给你 id_quarter = c(('a',1), ('a',2), ('a',3), ..., ('c',4))

这可以使用虚拟变量上的连接来完成,如下所示:

id <- c("a", "b", "c")
df <- data.frame(id)
quarter <- c(1, 2, 3, 4)
df_q <- data.frame(quarter)

# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
copy_to(con, df_q, "quarter", temporary = FALSE)

# create placeholder column
data <- tbl(con, "data") %>%
    mutate(dummy_placeholder = 1)
quarters <- tbl(con, "quarter") %>%
    mutate(dummy_placeholder = 1)

# join and collect
result <- data %>%
    inner_join(quarter, by = "dummy_placeholder") %>%
    collect()