我怎样才能更好地使用 purrr 在 R 中重构以下代码
How could I better refactor the following code in R using purrr
我的函数 parse_columns
有四个参数:
- A data.frame \ tibble:
df
- 表示小标题中列子集的字符向量:
vars
- 一个模式:
pattern
- 要在同一标题中创建的新输出列的名称:
out_name
它计算输入小标题 (df
) 中参数 2 (vars
) 中参数 3 (pattern
) 的实例,并创建一个新列 (out_name
) 在标题中。
函数:
library(dplyr)
library(stringr)
parse_columns <- function(df, vars, pattern, out_name){
df <- df %>%
rowwise() %>%
mutate(x = sum(across(all_of(vars), .fns = ~ as.numeric(str_detect(., pattern))
)
)
)
names(df)[names(df) == "x"] <- out_name
return(df)
}
我(至少)四次调用该函数:
tidy <- parse_columns(tidy, additional_vars, "w", "available_w")
tidy <- parse_columns(tidy, additional_vars, "x", "available_x")
tidy <- parse_columns(tidy, additional_vars, "y", "available_y")
tidy <- parse_columns(tidy, additional_vars, "z", "available_z")
我的问题是,如何使用 purrr
(可能使用 purrr:pmap()
)重构以上 4 行代码?
编辑#1:
感谢@NelsonGon 关于使用 map2() 的评论。
我尝试了以下方法:
library(stringi)
arg1 <- c("w", "x", "y", "z")
arg2 <- "available_" %s+% arg1
tidy %>% map2(arg1, arg2, .f = parse_columns(.,
vars = additional_vars,
pattern = arg1,
out_name = arg2
)
但是得到这个错误(在函数中):
Error: Can't convert a `rowwise_df/tbl_df/tbl/data.frame` object to function
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In names(df)[names(df) == "x"] <- out_name :
number of items to replace is not a multiple of replacement length
编辑#2:
@RonakShah,整洁的 df 包含 PII,但要点是对所选 vars
(按行)的 pattern
的所有实例求和,输出与 out_name
相同的 tibble 作为新变量。所以使用下面的:
tidy <- tibble(
a = str_to_lower(LETTERS),
b = str_to_lower(LETTERS),
c = str_to_lower(LETTERS),
d = rnorm(26)
)
additional_vars <- c("a", "b", "c")
tidy <- parse_columns(tidy, additional_vars, "w", "available_w")
tidy <- parse_columns(tidy, additional_vars, "x", "available_x")
tidy <- parse_columns(tidy, additional_vars, "y", "available_y")
tidy <- parse_columns(tidy, additional_vars, "z", "available_z")
print(tail(tidy))
# A tibble: 6 x 8
# Rowwise:
a b c d available_w available_x available_y available_z
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 u u u 0.692 0 0 0 0
2 v v v 1.05 0 0 0 0
3 w w w 0.544 3 0 0 0
4 x x x -1.93 0 3 0 0
5 y y y 0.943 0 0 3 0
6 z z z 0.992 0 0 0 3
你可以试试这个-
library(dplyr)
library(purrr)
parse_columns <- function(df, vars, pattern, out_name){
df %>% transmute(!!out_name := rowSums(across(all_of(vars),
.fns = ~ str_detect(., pattern))))
}
cols <- c("w", "x", "y", "z")
result_cols <- paste("available", cols, sep = "_")
tidy %>%
bind_cols(map2_dfc(cols, result_cols,
~parse_columns(tidy, additional_vars, .x, .y)))
# A tibble: 26 x 8
# a b c d available_w available_x available_y available_z
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a a a 0.0538 0 0 0 0
# 2 b b b 1.61 0 0 0 0
# 3 c c c -0.0172 0 0 0 0
# 4 d d d -0.543 0 0 0 0
# 5 e e e 1.98 0 0 0 0
# 6 f f f -1.37 0 0 0 0
# 7 g g g 0.425 0 0 0 0
# 8 h h h -0.976 0 0 0 0
# 9 i i i 1.19 0 0 0 0
#10 j j j 0.441 0 0 0 0
# … with 16 more rows
我的函数 parse_columns
有四个参数:
- A data.frame \ tibble:
df
- 表示小标题中列子集的字符向量:
vars
- 一个模式:
pattern
- 要在同一标题中创建的新输出列的名称:
out_name
它计算输入小标题 (df
) 中参数 2 (vars
) 中参数 3 (pattern
) 的实例,并创建一个新列 (out_name
) 在标题中。
函数:
library(dplyr)
library(stringr)
parse_columns <- function(df, vars, pattern, out_name){
df <- df %>%
rowwise() %>%
mutate(x = sum(across(all_of(vars), .fns = ~ as.numeric(str_detect(., pattern))
)
)
)
names(df)[names(df) == "x"] <- out_name
return(df)
}
我(至少)四次调用该函数:
tidy <- parse_columns(tidy, additional_vars, "w", "available_w")
tidy <- parse_columns(tidy, additional_vars, "x", "available_x")
tidy <- parse_columns(tidy, additional_vars, "y", "available_y")
tidy <- parse_columns(tidy, additional_vars, "z", "available_z")
我的问题是,如何使用 purrr
(可能使用 purrr:pmap()
)重构以上 4 行代码?
编辑#1: 感谢@NelsonGon 关于使用 map2() 的评论。
我尝试了以下方法:
library(stringi)
arg1 <- c("w", "x", "y", "z")
arg2 <- "available_" %s+% arg1
tidy %>% map2(arg1, arg2, .f = parse_columns(.,
vars = additional_vars,
pattern = arg1,
out_name = arg2
)
但是得到这个错误(在函数中):
Error: Can't convert a `rowwise_df/tbl_df/tbl/data.frame` object to function
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In names(df)[names(df) == "x"] <- out_name :
number of items to replace is not a multiple of replacement length
编辑#2:
@RonakShah,整洁的 df 包含 PII,但要点是对所选 vars
(按行)的 pattern
的所有实例求和,输出与 out_name
相同的 tibble 作为新变量。所以使用下面的:
tidy <- tibble(
a = str_to_lower(LETTERS),
b = str_to_lower(LETTERS),
c = str_to_lower(LETTERS),
d = rnorm(26)
)
additional_vars <- c("a", "b", "c")
tidy <- parse_columns(tidy, additional_vars, "w", "available_w")
tidy <- parse_columns(tidy, additional_vars, "x", "available_x")
tidy <- parse_columns(tidy, additional_vars, "y", "available_y")
tidy <- parse_columns(tidy, additional_vars, "z", "available_z")
print(tail(tidy))
# A tibble: 6 x 8
# Rowwise:
a b c d available_w available_x available_y available_z
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 u u u 0.692 0 0 0 0
2 v v v 1.05 0 0 0 0
3 w w w 0.544 3 0 0 0
4 x x x -1.93 0 3 0 0
5 y y y 0.943 0 0 3 0
6 z z z 0.992 0 0 0 3
你可以试试这个-
library(dplyr)
library(purrr)
parse_columns <- function(df, vars, pattern, out_name){
df %>% transmute(!!out_name := rowSums(across(all_of(vars),
.fns = ~ str_detect(., pattern))))
}
cols <- c("w", "x", "y", "z")
result_cols <- paste("available", cols, sep = "_")
tidy %>%
bind_cols(map2_dfc(cols, result_cols,
~parse_columns(tidy, additional_vars, .x, .y)))
# A tibble: 26 x 8
# a b c d available_w available_x available_y available_z
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a a a 0.0538 0 0 0 0
# 2 b b b 1.61 0 0 0 0
# 3 c c c -0.0172 0 0 0 0
# 4 d d d -0.543 0 0 0 0
# 5 e e e 1.98 0 0 0 0
# 6 f f f -1.37 0 0 0 0
# 7 g g g 0.425 0 0 0 0
# 8 h h h -0.976 0 0 0 0
# 9 i i i 1.19 0 0 0 0
#10 j j j 0.441 0 0 0 0
# … with 16 more rows