我怎样才能更好地使用 purrr 在 R 中重构以下代码

How could I better refactor the following code in R using purrr

我的函数 parse_columns 有四个参数:

  1. A data.frame \ tibble: df
  2. 表示小标题中列子集的字符向量:vars
  3. 一个模式:pattern
  4. 要在同一标题中创建的新输出列的名称:out_name

它计算输入小标题 (df) 中参数 2 (vars) 中参数 3 (pattern) 的实例,并创建一个新列 (out_name ) 在标题中。

函数:

library(dplyr)
library(stringr)

parse_columns <- function(df, vars, pattern, out_name){
  df <- df %>% 
    rowwise() %>% 
    mutate(x = sum(across(all_of(vars), .fns = ~ as.numeric(str_detect(., pattern))
                          )
                   )
           )
  names(df)[names(df) == "x"] <- out_name
  return(df)  
}

我(至少)四次调用该函数:

tidy <- parse_columns(tidy, additional_vars, "w", "available_w")
tidy <- parse_columns(tidy, additional_vars, "x", "available_x")
tidy <- parse_columns(tidy, additional_vars, "y", "available_y")
tidy <- parse_columns(tidy, additional_vars, "z", "available_z")

我的问题是,如何使用 purrr(可能使用 purrr:pmap())重构以上 4 行代码?

编辑#1: 感谢@NelsonGon 关于使用 map2() 的评论。

我尝试了以下方法:

library(stringi)
arg1 <- c("w", "x", "y", "z")
arg2 <- "available_" %s+% arg1
tidy %>% map2(arg1, arg2, .f = parse_columns(.,
                                             vars = additional_vars,
                                             pattern = arg1,
                                             out_name = arg2
                                             )

但是得到这个错误(在函数中):

Error: Can't convert a `rowwise_df/tbl_df/tbl/data.frame` object to function
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In names(df)[names(df) == "x"] <- out_name :
  number of items to replace is not a multiple of replacement length

编辑#2: @RonakShah,整洁的 df 包含 PII,但要点是对所选 vars(按行)的 pattern 的所有实例求和,输出与 out_name 相同的 tibble 作为新变量。所以使用下面的:

tidy <- tibble(
  a = str_to_lower(LETTERS),
  b = str_to_lower(LETTERS),
  c = str_to_lower(LETTERS),
  d = rnorm(26)
)
additional_vars <- c("a", "b", "c")

tidy <- parse_columns(tidy, additional_vars, "w", "available_w")
tidy <- parse_columns(tidy, additional_vars, "x", "available_x")
tidy <- parse_columns(tidy, additional_vars, "y", "available_y")
tidy <- parse_columns(tidy, additional_vars, "z", "available_z")

print(tail(tidy))

# A tibble: 6 x 8
# Rowwise: 
  a     b     c          d available_w available_x available_y available_z
  <chr> <chr> <chr>  <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
1 u     u     u      0.692           0           0           0           0
2 v     v     v      1.05            0           0           0           0
3 w     w     w      0.544           3           0           0           0
4 x     x     x     -1.93            0           3           0           0
5 y     y     y      0.943           0           0           3           0
6 z     z     z      0.992           0           0           0           3

你可以试试这个-

library(dplyr)
library(purrr)

parse_columns <- function(df, vars, pattern, out_name){
  df %>% transmute(!!out_name := rowSums(across(all_of(vars), 
                                 .fns = ~ str_detect(., pattern))))
}
cols <- c("w", "x", "y", "z")
result_cols <- paste("available", cols, sep = "_")

tidy %>%
  bind_cols(map2_dfc(cols, result_cols, 
            ~parse_columns(tidy, additional_vars, .x, .y)))

# A tibble: 26 x 8
#   a     b     c           d available_w available_x available_y available_z
#   <chr> <chr> <chr>   <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
# 1 a     a     a      0.0538           0           0           0           0
# 2 b     b     b      1.61             0           0           0           0
# 3 c     c     c     -0.0172           0           0           0           0
# 4 d     d     d     -0.543            0           0           0           0
# 5 e     e     e      1.98             0           0           0           0
# 6 f     f     f     -1.37             0           0           0           0
# 7 g     g     g      0.425            0           0           0           0
# 8 h     h     h     -0.976            0           0           0           0
# 9 i     i     i      1.19             0           0           0           0
#10 j     j     j      0.441            0           0           0           0
# … with 16 more rows