如何有效地将多个函数同时应用于同一个数据帧并将结果保存为数据帧列表?

How to efficiently apply multiple functions simultaneously to the same dataframe and save the results as a list of dataframes?

我想同时对一个数据帧应用几个不同的函数,然后将结果放入一个数据帧列表中。因此,例如,我可以 arrange 一列,然后将输出保存为一个新的数据框。或者我可以 filter 一些数据,然后另存为另一个新数据框(依此类推)。我觉得一定有一种简单的方法可以用 purrrapply 来做到这一点,但我不确定如何进行。所以,我想知道是否有办法给出一个函数列表,然后 return 一个数据帧列表。以下是我应用于 mtcars:

的一些示例函数
library(tidyverse)

filter_df <- function(x, word) {
  x %>% 
    tibble::rownames_to_column("ID") %>% 
    filter(str_detect(ID, word))
}
a <- filter_df(mtcars, "Merc")


mean_n_df <- function(x, grp, mean2) {
  x %>%
    group_by({{grp}}) %>%
    summarise(mean = mean({{mean2}}), n = n())
}
b <- mean_n_df(mtcars, grp = cyl, mean2 = wt)



rating <- function(x, a, b, c) {
  x %>% 
    rowwise %>% 
    mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
  
}
c <- rating(mtcars, a = cyl, b = drat, c = qsec)



pct <- function(data, var, round = 4){
  var_expr <- rlang::enquo(var)
  colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
  
  data %>%
    mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
             round(round))
}
d <- pct(mtcars, mpg)

我知道我可以 运行 上面的代码,然后只需将每个数据帧绑定到一个列表中。

df_list <- list(mtcars, a, b, c, d)

str(df_list, 1)[[1]]

List of 5
 $ :'data.frame':   32 obs. of  11 variables:
 $ :'data.frame':   7 obs. of  12 variables:
 $ : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
 $ : rowwise_df [32 × 12] (S3: rowwise_df/tbl_df/tbl/data.frame)
  ..- attr(*, "groups")= tibble [32 × 1] (S3: tbl_df/tbl/data.frame)
 $ :'data.frame':   32 obs. of  12 variables:

这似乎有点定制(因为每个函数需要不同的参数),但我会使用 Map(或 purrr::map2purrr::pmap),传递函数和参数为此:

filter_df <- function(x, word) {
  x %>% 
    tibble::rownames_to_column("ID") %>% 
    filter(str_detect(ID, word))
}
mean_n_df <- function(x, grp, mean2) {
  x %>%
    group_by({{grp}}) %>%
    summarise(mean = mean({{mean2}}), n = n())
}
rating <- function(x, a, b, c) {
  x %>% 
    rowwise %>% 
    mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
pct <- function(data, var, round = 4){
  var_expr <- rlang::enquo(var)
  colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
  data %>%
    mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
             round(round))
}

来电:

out <- Map(
  function(fun, args) do.call(fun, c(list(mtcars), args)),
  list(filter_df, mean_n_df, rating, pct),
  list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
       list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
       list(quo(mpg)))
)

lapply(out, head, 3)
# [[1]]
#          ID  mpg cyl  disp  hp drat   wt qsec vs am gear carb
# 1 Merc 240D 24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
# 2  Merc 230 22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
# 3  Merc 280 19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
# [[2]]
# # A tibble: 3 x 3
#     cyl  mean     n
#   <dbl> <dbl> <int>
# 1     4  2.29    11
# 2     6  3.12     7
# 3     8  4.00    14
# [[3]]
# # A tibble: 3 x 12
# # Rowwise: 
#     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb rating
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
# 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4  -2.89
# 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4  -2.28
# 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1  -5.10
# [[4]]
#                mpg cyl disp  hp drat    wt  qsec vs am gear carb    mpg_pct
# Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 0.03266449
# Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 0.03266449
# Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 0.03546430

几件事:

  • 因为您演示了使用未计算的符号 (grp=cyl),我们必须先 quote 它们,否则它们会在到达函数之前被计算。

  • 您可以通过不在 Map anon-func 中对其进行硬编码来将其泛化为任意数据,其中:

    out <- Map(
      function(x, fun, args) do.call(fun, c(list(x), args)),
      list(mtcars),
      list(filter_df, mean_n_df, rating, pct),
      list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
           list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
           list(quo(mpg)))
    )
    

    mtcars 周围的 list(.) 是有意的:它显示为 Map 的长度 1,因此它被回收用于其他参数(每个长度 4)。没有 list,Map 会失败,因为第一个函数会看到第一列(作为向量),第二个函数会看到第二列(and/or 警告 longer argument not a multiple of length of shorter ...我真的希望错- R 中的对齐回收会比这更难失败)。

    这种概括允许将这一系列函数应用于多个数据集:

    out2 <- lapply(list(mtcars[1:10,], mtcars[11:32,]), function(XYZ) {
      Map(
        function(x, fun, args) do.call(fun, c(list(x), args)),
        list(XYZ),
        list(filter_df, mean_n_df, rating, pct),
        list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
             list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
             list(quo(mpg)))
      )
    })
    

    不确定您是否打算开始将函数列表应用于数据集列表...

invokepurrr

中的 map2 结合使用
library(purrr)
df_list2 <- c(list(mtcars), map2(list(filter_df, mean_n_df, rating, pct), 
   list("Merc", expression(grp = cyl, mean2 = wt), 
       expression(a = cyl, b= drat, c = qsec), quote(mpg)), 
     ~ invoke(.x, c(list(mtcars), as.list(.y)))))

-正在检查

all.equal(df_list2, df_list, check.attributes = FALSE)
[1] TRUE