如何有效地将多个函数同时应用于同一个数据帧并将结果保存为数据帧列表?
How to efficiently apply multiple functions simultaneously to the same dataframe and save the results as a list of dataframes?
我想同时对一个数据帧应用几个不同的函数,然后将结果放入一个数据帧列表中。因此,例如,我可以 arrange
一列,然后将输出保存为一个新的数据框。或者我可以 filter
一些数据,然后另存为另一个新数据框(依此类推)。我觉得一定有一种简单的方法可以用 purrr
或 apply
来做到这一点,但我不确定如何进行。所以,我想知道是否有办法给出一个函数列表,然后 return 一个数据帧列表。以下是我应用于 mtcars
:
的一些示例函数
library(tidyverse)
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
a <- filter_df(mtcars, "Merc")
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
b <- mean_n_df(mtcars, grp = cyl, mean2 = wt)
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
c <- rating(mtcars, a = cyl, b = drat, c = qsec)
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
d <- pct(mtcars, mpg)
我知道我可以 运行 上面的代码,然后只需将每个数据帧绑定到一个列表中。
df_list <- list(mtcars, a, b, c, d)
str(df_list, 1)[[1]]
List of 5
$ :'data.frame': 32 obs. of 11 variables:
$ :'data.frame': 7 obs. of 12 variables:
$ : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
$ : rowwise_df [32 × 12] (S3: rowwise_df/tbl_df/tbl/data.frame)
..- attr(*, "groups")= tibble [32 × 1] (S3: tbl_df/tbl/data.frame)
$ :'data.frame': 32 obs. of 12 variables:
这似乎有点定制(因为每个函数需要不同的参数),但我会使用 Map
(或 purrr::map2
或 purrr::pmap
),传递函数和参数为此:
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
来电:
out <- Map(
function(fun, args) do.call(fun, c(list(mtcars), args)),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
lapply(out, head, 3)
# [[1]]
# ID mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# 3 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# [[2]]
# # A tibble: 3 x 3
# cyl mean n
# <dbl> <dbl> <int>
# 1 4 2.29 11
# 2 6 3.12 7
# 3 8 4.00 14
# [[3]]
# # A tibble: 3 x 12
# # Rowwise:
# mpg cyl disp hp drat wt qsec vs am gear carb rating
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 -2.89
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 -2.28
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 -5.10
# [[4]]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_pct
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.03266449
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.03266449
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.03546430
几件事:
因为您演示了使用未计算的符号 (grp=cyl
),我们必须先 quo
te 它们,否则它们会在到达函数之前被计算。
您可以通过不在 Map
anon-func 中对其进行硬编码来将其泛化为任意数据,其中:
out <- Map(
function(x, fun, args) do.call(fun, c(list(x), args)),
list(mtcars),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
mtcars
周围的 list(.)
是有意的:它显示为 Map
的长度 1,因此它被回收用于其他参数(每个长度 4)。没有 list
,Map 会失败,因为第一个函数会看到第一列(作为向量),第二个函数会看到第二列(and/or 警告 longer argument not a multiple of length of shorter
...我真的希望错- R 中的对齐回收会比这更难失败)。
这种概括允许将这一系列函数应用于多个数据集:
out2 <- lapply(list(mtcars[1:10,], mtcars[11:32,]), function(XYZ) {
Map(
function(x, fun, args) do.call(fun, c(list(x), args)),
list(XYZ),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
})
不确定您是否打算开始将函数列表应用于数据集列表...
将 invoke
与 purrr
中的 map2
结合使用
library(purrr)
df_list2 <- c(list(mtcars), map2(list(filter_df, mean_n_df, rating, pct),
list("Merc", expression(grp = cyl, mean2 = wt),
expression(a = cyl, b= drat, c = qsec), quote(mpg)),
~ invoke(.x, c(list(mtcars), as.list(.y)))))
-正在检查
all.equal(df_list2, df_list, check.attributes = FALSE)
[1] TRUE
我想同时对一个数据帧应用几个不同的函数,然后将结果放入一个数据帧列表中。因此,例如,我可以 arrange
一列,然后将输出保存为一个新的数据框。或者我可以 filter
一些数据,然后另存为另一个新数据框(依此类推)。我觉得一定有一种简单的方法可以用 purrr
或 apply
来做到这一点,但我不确定如何进行。所以,我想知道是否有办法给出一个函数列表,然后 return 一个数据帧列表。以下是我应用于 mtcars
:
library(tidyverse)
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
a <- filter_df(mtcars, "Merc")
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
b <- mean_n_df(mtcars, grp = cyl, mean2 = wt)
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
c <- rating(mtcars, a = cyl, b = drat, c = qsec)
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
d <- pct(mtcars, mpg)
我知道我可以 运行 上面的代码,然后只需将每个数据帧绑定到一个列表中。
df_list <- list(mtcars, a, b, c, d)
str(df_list, 1)[[1]]
List of 5
$ :'data.frame': 32 obs. of 11 variables:
$ :'data.frame': 7 obs. of 12 variables:
$ : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
$ : rowwise_df [32 × 12] (S3: rowwise_df/tbl_df/tbl/data.frame)
..- attr(*, "groups")= tibble [32 × 1] (S3: tbl_df/tbl/data.frame)
$ :'data.frame': 32 obs. of 12 variables:
这似乎有点定制(因为每个函数需要不同的参数),但我会使用 Map
(或 purrr::map2
或 purrr::pmap
),传递函数和参数为此:
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
来电:
out <- Map(
function(fun, args) do.call(fun, c(list(mtcars), args)),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
lapply(out, head, 3)
# [[1]]
# ID mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# 3 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# [[2]]
# # A tibble: 3 x 3
# cyl mean n
# <dbl> <dbl> <int>
# 1 4 2.29 11
# 2 6 3.12 7
# 3 8 4.00 14
# [[3]]
# # A tibble: 3 x 12
# # Rowwise:
# mpg cyl disp hp drat wt qsec vs am gear carb rating
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 -2.89
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 -2.28
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 -5.10
# [[4]]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_pct
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.03266449
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.03266449
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.03546430
几件事:
因为您演示了使用未计算的符号 (
grp=cyl
),我们必须先quo
te 它们,否则它们会在到达函数之前被计算。您可以通过不在
Map
anon-func 中对其进行硬编码来将其泛化为任意数据,其中:out <- Map( function(x, fun, args) do.call(fun, c(list(x), args)), list(mtcars), list(filter_df, mean_n_df, rating, pct), list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)), list(a = quo(cyl), b = quo(drat), c = quo(qsec)), list(quo(mpg))) )
mtcars
周围的list(.)
是有意的:它显示为Map
的长度 1,因此它被回收用于其他参数(每个长度 4)。没有list
,Map 会失败,因为第一个函数会看到第一列(作为向量),第二个函数会看到第二列(and/or 警告longer argument not a multiple of length of shorter
...我真的希望错- R 中的对齐回收会比这更难失败)。这种概括允许将这一系列函数应用于多个数据集:
out2 <- lapply(list(mtcars[1:10,], mtcars[11:32,]), function(XYZ) { Map( function(x, fun, args) do.call(fun, c(list(x), args)), list(XYZ), list(filter_df, mean_n_df, rating, pct), list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)), list(a = quo(cyl), b = quo(drat), c = quo(qsec)), list(quo(mpg))) ) })
不确定您是否打算开始将函数列表应用于数据集列表...
将 invoke
与 purrr
map2
结合使用
library(purrr)
df_list2 <- c(list(mtcars), map2(list(filter_df, mean_n_df, rating, pct),
list("Merc", expression(grp = cyl, mean2 = wt),
expression(a = cyl, b= drat, c = qsec), quote(mpg)),
~ invoke(.x, c(list(mtcars), as.list(.y)))))
-正在检查
all.equal(df_list2, df_list, check.attributes = FALSE)
[1] TRUE