如何避免在 dplyr::mutate() 调用中使用多个参数重复代码?
How to avoid repeating code in dplyr::mutate() call with multiple arguments?
问题
我正在从基地 R
过渡到 dplyr
。
我想缩短以下代码以尊重 DRY(不要重复自己)原则:
mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
x = rowMeans(select(., hp:wt), na.rm = TRUE),
y = rowMeans(select(., qsec:am), na.rm = TRUE),
z = rowMeans(select(., gear:carb), na.rm = TRUE))
或
mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
x = mean(hp:wt, na.rm = TRUE),
y = mean(qsec:am, na.rm = TRUE),
z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data
目标
目标是通过一次调用计算数据框中不同尺度的均值。如您所见,rowMeans
、select
和 na.rm
参数重复多次(假设我有比这个示例多几个变量)。
尝试次数
我正试图想出一个 across()
解决方案,
mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))
但是它没有产生正确的结果,因为我不知道如何为 w:z
指定不同的列参数。使用文档示例中的 c_across
,我们又回到了重复代码:
mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
x = mean(c_across(hp:wt), na.rm = TRUE),
y = mean(c_across(qsec:am), na.rm = TRUE),
z = mean(c_across(gear:carb), na.rm = TRUE))
我很想求助于 lapply
或自定义函数,但我觉得这会破坏适应 dplyr
和新的 across()
参数的目的。
编辑: 澄清一下,我想避免多次调用 rowMeans
、select
和 na.rm
。
相关话题: , , .
我们不需要 rowwise
,而是使用 select
和矢量化的 rowMeans
。为了使这更容易,可以创建一个函数
f1 <- function(dat, nm1) {
dat %>%
select({{nm1}}) %>%
rowMeans(na.rm = TRUE)
}
mtcars %>% mutate(w = f1(dat = ., nm1 = mpg:disp),
x = f1(dat = ., nm1 = hp:wt),
y = f1(dat = ., nm1 = qsec:am),
z = f1(dat = ., nm1= gear:carb) )
使用自定义函数(但以不同方式组织以减少重复代码)
mm <- function(data, new_col, cols_to_mut) {
data %>%
mutate(
{{ new_col }} := mean(c_across({{ cols_to_mut }}), na.rm=TRUE)
)
}
mtcars %>%
rowwise %>%
mm(w, mpg:disp) %>%
mm(x, hp:wt) %>%
mm(y, qsec:am) %>%
mm(z, gear:carb) %>%
ungroup
考虑使用 purrr::reduce2
以避免重复
mtcars %>%
reduce2(
c("w","x", "y", "z"),
c("mpg:disp", "hp:wt","qsec:am","gear:carb"),
~ ..1 %>% rowwise %>% mutate(!!..2 := mean(c_across(!!rlang::parse_expr(..3)), na.rm=TRUE)),
.init = .)
# A tibble: 32 x 15
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb w x y z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 62.3 38.8 5.82 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 62.3 38.9 6.01 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 44.9 33.1 6.87 2.5
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 95.1 38.8 6.81 2
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 129. 60.5 5.67 2.5
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 83.0 37.1 7.07 2
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 127. 83.9 5.28 3.5
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 58.4 23.0 7 3
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 55.9 34.0 7.97 3
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 64.3 43.5 6.43 4
# ... with 22 more rows
问题
我正在从基地 R
过渡到 dplyr
。
我想缩短以下代码以尊重 DRY(不要重复自己)原则:
mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
x = rowMeans(select(., hp:wt), na.rm = TRUE),
y = rowMeans(select(., qsec:am), na.rm = TRUE),
z = rowMeans(select(., gear:carb), na.rm = TRUE))
或
mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
x = mean(hp:wt, na.rm = TRUE),
y = mean(qsec:am, na.rm = TRUE),
z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data
目标
目标是通过一次调用计算数据框中不同尺度的均值。如您所见,rowMeans
、select
和 na.rm
参数重复多次(假设我有比这个示例多几个变量)。
尝试次数
我正试图想出一个 across()
解决方案,
mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))
但是它没有产生正确的结果,因为我不知道如何为 w:z
指定不同的列参数。使用文档示例中的 c_across
,我们又回到了重复代码:
mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
x = mean(c_across(hp:wt), na.rm = TRUE),
y = mean(c_across(qsec:am), na.rm = TRUE),
z = mean(c_across(gear:carb), na.rm = TRUE))
我很想求助于 lapply
或自定义函数,但我觉得这会破坏适应 dplyr
和新的 across()
参数的目的。
编辑: 澄清一下,我想避免多次调用 rowMeans
、select
和 na.rm
。
相关话题:
我们不需要 rowwise
,而是使用 select
和矢量化的 rowMeans
。为了使这更容易,可以创建一个函数
f1 <- function(dat, nm1) {
dat %>%
select({{nm1}}) %>%
rowMeans(na.rm = TRUE)
}
mtcars %>% mutate(w = f1(dat = ., nm1 = mpg:disp),
x = f1(dat = ., nm1 = hp:wt),
y = f1(dat = ., nm1 = qsec:am),
z = f1(dat = ., nm1= gear:carb) )
使用自定义函数(但以不同方式组织以减少重复代码)
mm <- function(data, new_col, cols_to_mut) {
data %>%
mutate(
{{ new_col }} := mean(c_across({{ cols_to_mut }}), na.rm=TRUE)
)
}
mtcars %>%
rowwise %>%
mm(w, mpg:disp) %>%
mm(x, hp:wt) %>%
mm(y, qsec:am) %>%
mm(z, gear:carb) %>%
ungroup
考虑使用 purrr::reduce2
以避免重复
mtcars %>%
reduce2(
c("w","x", "y", "z"),
c("mpg:disp", "hp:wt","qsec:am","gear:carb"),
~ ..1 %>% rowwise %>% mutate(!!..2 := mean(c_across(!!rlang::parse_expr(..3)), na.rm=TRUE)),
.init = .)
# A tibble: 32 x 15
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb w x y z
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 62.3 38.8 5.82 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 62.3 38.9 6.01 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 44.9 33.1 6.87 2.5
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 95.1 38.8 6.81 2
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 129. 60.5 5.67 2.5
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 83.0 37.1 7.07 2
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 127. 83.9 5.28 3.5
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 58.4 23.0 7 3
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 55.9 34.0 7.97 3
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 64.3 43.5 6.43 4
# ... with 22 more rows