在 mutate 中使用引号：替代 mutate_(.dots = ...)

Question

我想对小标题中的同一列应用不同的函数。这些函数存储在一个字符串中。我曾经用 mutate_ 和 .dots 参数这样做：

library(dplyr)

myfuns <- c(f1 = "a^2", f2 = "exp(a)", f3 = "sqrt(a)")
tibble(a = 1:3) %>% 
  mutate_(.dots = myfuns)

此方法仍然有效，但 mutate_ 已弃用。我试图用 mutate 和 rlang 包获得相同的结果，但没有取得太大进展。

在我的真实示例中，myfuns 包含大约 200 个函数，因此无法逐个键入它们。

提前致谢。

Answer 1

您只有一列，因此以下两种方法都会得到相同的结果。

您只需修改函数列表即可。

library(dplyr)

myfuns <- c(f1 = ~.^2, f2 = ~exp(.), f3 = ~sqrt(.))

tibble(a = 1:3) %>% mutate_at(vars(a), myfuns)

tibble(a = 1:3) %>% mutate_all(myfuns)


# # A tibble: 3 x 4
#       a    f1    f2    f3
#   <int> <dbl> <dbl> <dbl>
# 1     1     1  2.72  1   
# 2     2     4  7.39  1.41
# 3     3     9 20.1   1.73

Answer 2

使用 rlang

中的 parse_expr 的一种方法

library(tidyverse)
library(rlang)

tibble(a = 1:3) %>% 
   mutate(ans =  map(myfuns, ~eval(parse_expr(.)))) %>%
   #OR mutate(ans =  map(myfuns, ~eval(parse(text  = .)))) %>%
   unnest() %>%
   group_by(a) %>%
   mutate(temp = row_number()) %>%
   spread(a, ans) %>%
   select(-temp) %>%
   rename_all(~names(myfuns))

# A tibble: 3 x 3
#    f1    f2    f3
#  <dbl> <dbl> <dbl>
#1     1  2.72  1   
#2     4  7.39  1.41
#3     9  20.1  1.73

Answer 3

对于采用单个输入的简单方程式，提供函数本身就足够了，例如

iris %>% mutate_at(vars(-Species), sqrt)

或者，当使用方程而不是简单函数时，通过公式：

iris %>% mutate_at(vars(-Species), ~ . ^ 2)

当使用访问多个变量的方程式时，您需要改用 rlang quosures：

area = quo(Sepal.Length * Sepal.Width)
iris %>% mutate(Sepal.Area = !! area)

在这里，quo creates a “quosure” — 即方程式的引用表示，与字符串的使用相同，除了与字符串不同的是，这个范围适当，可直接由 dplyr 使用，并且在概念上更清晰：它与任何其他 R 表达式一样，只是尚未计算。区别如下：

1 + 2 是一个表达式，值为 3.
quo(1 + 2) 是一个未计算的表达式，其值为 1 + 2， 计算为 3，但它需要显式计算。那么我们如何评估未评估的表达式呢？好吧......:[=52=]

然后 !! (pronounced “bang bang”) 取消引用 previously-quoted 表达式，即计算它 — 在 mutate[= 的上下文中53=]。这很重要，因为 Sepal.Length 和 Sepal.Width 仅在 mutate 调用内部已知，而不是在调用外部。

在上述所有情况下，表达式也可以在列表中。唯一的区别是对于列表，您需要使用 !!! 而不是 !!:

funs = list( Sepal.Area = quo(Sepal.Length * Sepal.Width), Sepal.Ratio = quo(Sepal.Length / Sepal.Width) ) iris %>% mutate(!!! funs)

!!!操作被称为“unquote-splice”。这个想法是它将其参数的列表元素“拼接”到父调用中。也就是说，它似乎将调用修改为好像它逐字包含列表元素作为参数（不过这仅适用于支持它的函数，例如 mutate）。

Answer 4

您也可以尝试 purrr 方法

# define the functions
f1 <- function(a) a^2
f2 <- function(a, b) a + b
f3 <- function(b) sqrt(b)

# put all functions in one list
tibble(funs=list(f1, f2, f3)) %>%
  # give each function a name 
  mutate(fun_id=paste0("f", row_number())) %>% 
  # add to each row/function the matching column profile
  # first extract the column names you specified in each function 
  #mutate(columns=funs %>% 
  #         toString() %>% 
  #         str_extract_all(., "function \(.*?\)", simplify = T) %>% 
  #         str_extract_all(., "(?<=\().+?(?=\))", simplify = T) %>%
  #         gsub(" ", "", .) %>% 
  #         str_split(., ",")) %>%
  # with the help of Konrad we can use fn_fmls_names
  mutate(columns=map(funs, ~ rlang::fn_fmls_names(.)))  %>% 
  # select the columns and add to our tibble/data.frame  
  mutate(params=map(columns, ~select(df, .))) %>% 
  # invoke the functions
  mutate(results = invoke_map(.f = funs, .x = params)) %>% 
  # transform  to desired output
  unnest(results) %>% 
  group_by(fun_id) %>% 
  mutate(n=row_number()) %>% 
  spread(fun_id, results) %>% 
  left_join(mutate(df, n=row_number()), .) %>% 
  select(-n)
Joining, by = "n"
# A tibble: 5 x 5
      a     b    f1    f2    f3
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     2     1     4     3  1   
2     4     1    16     5  1   
3     5     2    25     7  1.41
4     7     2    49     9  1.41
5     8     2    64    10  1.41

一些数据

df <- data_frame(
  a = c(2, 4, 5, 7, 8),
  b = c(1, 1, 2, 2, 2))

Answer 5

将字符串转换为表达式

myexprs <- purrr::map( myfuns, rlang::parse_expr )

然后使用 quasiquotation:

将这些表达式传递给正则 mutate

tibble(a = 1:3) %>% mutate( !!!myexprs )
# # A tibble: 3 x 4
#       a    f1    f2    f3
#   <int> <dbl> <dbl> <dbl>
# 1     1     1  2.72  1   
# 2     2     4  7.39  1.41
# 3     3     9 20.1   1.73

请注意，这也适用于涉及多列的字符串/表达式。

Answer 6

基本替代方案：

myfuns <- c(f1 = "a^2", f2 = "exp(a)", f3 = "sqrt(a)")
df <- data.frame(a = 1:3)
df[names(myfuns)] <- lapply(myfuns , function(x) eval(parse(text= x), envir = df))
df
#>   a f1        f2       f3
#> 1 1  1  2.718282 1.000000
#> 2 2  4  7.389056 1.414214
#> 3 3  9 20.085537 1.732051

^{由 reprex package (v0.3.0)}

于 2019-07-08 创建

在 mutate 中使用引号：替代 mutate_(.dots = ...)

Using quotations inside mutate: an alternative to mutate_(.dots = ...)

r

dplyr

rlang