如何使用 dplyr 在 R 中扩展分组数据?
How can I expand grouped data in R using dplyr?
有没有可能更高效更简洁地实现下面代码的结果?由于我的任务性质,我不能使用 base 或 tidyr 函数,因为 dplyr 代码需要转换为 SQL 并在数据库上执行。
library(dplyr)
library(dbplyr)
library(RSQLite)
library(DBI)
# Create example data set
id <- c("a", "b", "c")
df <- data.frame(id)
# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
# Expand to data set and create a variable for four quarters
n <- 4
data <- tbl(con, "data") %>%
mutate(quarter = 1)
for (i in 2:n) {
data <- data %>%
mutate(quarter = i) %>%
union(data, data) %>%
show_query()
}
data <- collect(data)
我在现实生活中的目标是查询 ID 列表并将其扩展为具有变量 "quarter" 的数据集。我想以那个列表为基础,以后陆续加入更多的信息。
听起来你想要 id = c('a', 'b', 'c')
和 quarters = c(1, 2, 3, 4)
的笛卡尔积,这会给你 id_quarter = c(('a',1), ('a',2), ('a',3), ..., ('c',4))
。
这可以使用虚拟变量上的连接来完成,如下所示:
id <- c("a", "b", "c")
df <- data.frame(id)
quarter <- c(1, 2, 3, 4)
df_q <- data.frame(quarter)
# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
copy_to(con, df_q, "quarter", temporary = FALSE)
# create placeholder column
data <- tbl(con, "data") %>%
mutate(dummy_placeholder = 1)
quarters <- tbl(con, "quarter") %>%
mutate(dummy_placeholder = 1)
# join and collect
result <- data %>%
inner_join(quarter, by = "dummy_placeholder") %>%
collect()
有没有可能更高效更简洁地实现下面代码的结果?由于我的任务性质,我不能使用 base 或 tidyr 函数,因为 dplyr 代码需要转换为 SQL 并在数据库上执行。
library(dplyr)
library(dbplyr)
library(RSQLite)
library(DBI)
# Create example data set
id <- c("a", "b", "c")
df <- data.frame(id)
# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
# Expand to data set and create a variable for four quarters
n <- 4
data <- tbl(con, "data") %>%
mutate(quarter = 1)
for (i in 2:n) {
data <- data %>%
mutate(quarter = i) %>%
union(data, data) %>%
show_query()
}
data <- collect(data)
我在现实生活中的目标是查询 ID 列表并将其扩展为具有变量 "quarter" 的数据集。我想以那个列表为基础,以后陆续加入更多的信息。
听起来你想要 id = c('a', 'b', 'c')
和 quarters = c(1, 2, 3, 4)
的笛卡尔积,这会给你 id_quarter = c(('a',1), ('a',2), ('a',3), ..., ('c',4))
。
这可以使用虚拟变量上的连接来完成,如下所示:
id <- c("a", "b", "c")
df <- data.frame(id)
quarter <- c(1, 2, 3, 4)
df_q <- data.frame(quarter)
# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
copy_to(con, df_q, "quarter", temporary = FALSE)
# create placeholder column
data <- tbl(con, "data") %>%
mutate(dummy_placeholder = 1)
quarters <- tbl(con, "quarter") %>%
mutate(dummy_placeholder = 1)
# join and collect
result <- data %>%
inner_join(quarter, by = "dummy_placeholder") %>%
collect()