在 dplyr::funs 的命名参数中,我可以引用其他参数的名称吗?
In a named argument to dplyr::funs, can I reference the names of other arguments?
考虑以下几点:
library(tidyverse)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
有没有办法通过引用 avg
和 dev
列来避免调用 mean
和 sd
两次。我的想法是
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))
显然这行不通,因为没有列 avg
和 dev
,但是 x_avg
、x_dev
、y_avg
、y_dev
,等等
在 funs
中有什么好方法可以使用 rlang
工具以编程方式创建这些列引用,这样我就可以将之前命名参数创建的列引用到 funs
(当.
是x
时,我会参考x_mean
和x_dev
来计算x_scaled
,等等)?
我认为如果将数据转换为长格式会更容易
library(tidyverse)
set.seed(111)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
gather(key, value) %>%
group_by(key) %>%
mutate(avg = mean(value),
sd = sd(value),
scaled = (value - avg) / sd)
#> # A tibble: 300 x 5
#> # Groups: key [3]
#> key value avg sd scaled
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 x 0.235 -0.0128 1.07 0.232
#> 2 x -0.331 -0.0128 1.07 -0.297
#> 3 x -0.312 -0.0128 1.07 -0.279
#> 4 x -2.30 -0.0128 1.07 -2.14
#> 5 x -0.171 -0.0128 1.07 -0.148
#> 6 x 0.140 -0.0128 1.07 0.143
#> 7 x -1.50 -0.0128 1.07 -1.39
#> 8 x -1.01 -0.0128 1.07 -0.931
#> 9 x -0.948 -0.0128 1.07 -0.874
#> 10 x -0.494 -0.0128 1.07 -0.449
#> # ... with 290 more rows
由 reprex package (v0.2.1.9000)
创建于 2018-11-04
这看起来有点费解,但它确实有效:
scaled <- function(col_name, x, y) {
col_name <- deparse(substitute(col_name))
avg <- eval.parent(as.symbol(paste0(col_name, x)))
dev <- eval.parent(as.symbol(paste0(col_name, y)))
(eval.parent(as.symbol(col_name)) - avg) / dev
}
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = scaled(., "_avg", "_dev")))
这可能对你有用:
avg <- quo(mean(.))
dev <- quo(sd(.))
res <- df %>%
mutate_all(funs(avg = !!avg, dev = !!dev, scaled = (. - !!avg) / !!dev))
确认它有效:
res0 <- df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
identical(res, res0)
# [1] TRUE
考虑以下几点:
library(tidyverse)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
有没有办法通过引用 avg
和 dev
列来避免调用 mean
和 sd
两次。我的想法是
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))
显然这行不通,因为没有列 avg
和 dev
,但是 x_avg
、x_dev
、y_avg
、y_dev
,等等
在 funs
中有什么好方法可以使用 rlang
工具以编程方式创建这些列引用,这样我就可以将之前命名参数创建的列引用到 funs
(当.
是x
时,我会参考x_mean
和x_dev
来计算x_scaled
,等等)?
我认为如果将数据转换为长格式会更容易
library(tidyverse)
set.seed(111)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
gather(key, value) %>%
group_by(key) %>%
mutate(avg = mean(value),
sd = sd(value),
scaled = (value - avg) / sd)
#> # A tibble: 300 x 5
#> # Groups: key [3]
#> key value avg sd scaled
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 x 0.235 -0.0128 1.07 0.232
#> 2 x -0.331 -0.0128 1.07 -0.297
#> 3 x -0.312 -0.0128 1.07 -0.279
#> 4 x -2.30 -0.0128 1.07 -2.14
#> 5 x -0.171 -0.0128 1.07 -0.148
#> 6 x 0.140 -0.0128 1.07 0.143
#> 7 x -1.50 -0.0128 1.07 -1.39
#> 8 x -1.01 -0.0128 1.07 -0.931
#> 9 x -0.948 -0.0128 1.07 -0.874
#> 10 x -0.494 -0.0128 1.07 -0.449
#> # ... with 290 more rows
由 reprex package (v0.2.1.9000)
创建于 2018-11-04这看起来有点费解,但它确实有效:
scaled <- function(col_name, x, y) {
col_name <- deparse(substitute(col_name))
avg <- eval.parent(as.symbol(paste0(col_name, x)))
dev <- eval.parent(as.symbol(paste0(col_name, y)))
(eval.parent(as.symbol(col_name)) - avg) / dev
}
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = scaled(., "_avg", "_dev")))
这可能对你有用:
avg <- quo(mean(.))
dev <- quo(sd(.))
res <- df %>%
mutate_all(funs(avg = !!avg, dev = !!dev, scaled = (. - !!avg) / !!dev))
确认它有效:
res0 <- df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
identical(res, res0)
# [1] TRUE