在以列标签作为字符向量的函数中使用 dplyr 动词

Question

我想创建一个函数，它将数据框和包含列名的字符向量作为输入，并以安全的方式在内部使用 tidy verse quoting 函数。

我相信我有一个我想做的工作的例子。我想知道是否有更优雅的解决方案或者我错误地考虑了这个问题（也许我不应该这样做？）。据我所知，为了避免变量范围问题，我需要将列名包装在 .data[[]] 中，并在取消引用 tidy verse NSE 动词之前使其成为一个表达式。

关于之前的问题this answer is along the right lines but I want to abstract the code into a function. A github issue 询问这个，但据我所知，使用 rlang::syms 是行不通的，因为列标签与 .data 的组合使其成为表达式而不是符号。和解决问题，但据我所知，不要考虑变量可能泄漏的细微错误如果它们在数据框中不作为列标签存在，或者解决方案不适用于作为标签向量的输入，则从环境中输入。

# Setup
suppressWarnings(suppressMessages(library("dplyr")))
suppressWarnings(suppressMessages(library("rlang")))

# define iris with and without Sepal.Width column
iris <- tibble::as_tibble(iris)
df_with_missing <- iris %>% select(-Sepal.Width)
# This should not be findable by my function
Sepal.Width <- iris$Sepal.Width * -1

################
# Now lets try a function for which we programmatically define the column labels
programmatic_mutate_y <- function(df, col_names, safe = FALSE) {
  # Add .data[[]] to the col_names to make evalutation safer
  col_exprs <- rlang::parse_exprs(
    purrr::map_chr(
      col_names,
      ~ glue::glue(stringr::str_c('.data[["{.x}"]]'))
    )
  )

  output <- dplyr::mutate(df, product = purrr::pmap_dbl(list(!!!col_exprs), ~ prod(...)))
  output
}
################
# The desired output
testthat::expect_error(programmatic_mutate_y(df_with_missing, c("Sepal.Width", "Sepal.Length")))
programmatic_mutate_y(iris, c("Sepal.Width", "Sepal.Length"))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species product
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa     17.8
#>  2          4.9         3            1.4         0.2 setosa     14.7
#>  3          4.7         3.2          1.3         0.2 setosa     15.0
#>  4          4.6         3.1          1.5         0.2 setosa     14.3
#>  5          5           3.6          1.4         0.2 setosa     18  
#>  6          5.4         3.9          1.7         0.4 setosa     21.1
#>  7          4.6         3.4          1.4         0.3 setosa     15.6
#>  8          5           3.4          1.5         0.2 setosa     17  
#>  9          4.4         2.9          1.4         0.2 setosa     12.8
#> 10          4.9         3.1          1.5         0.1 setosa     15.2
#> # … with 140 more rows

^{由 reprex package (v0.3.0)}

于 2019-08-09 创建

Answer 1

我觉得你把事情搞复杂了。使用 _at 变体，您几乎可以在每个 dplyr 函数中使用字符串作为参数。 purrr::pmap_dbl()用于映射按行计算。

programmatic_mutate_y_v1 <- function(df, col_names, safe = FALSE) {
    df["product"] <- purrr::pmap_dbl(dplyr::select_at(df,col_names),prod)
    return(df)
}

programmatic_mutate_y_v1(iris, c("Sepal.Width", "Sepal.Length"))

# A tibble: 150 x 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species product
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
 1          5.1         3.5          1.4         0.2 setosa     17.8
 2          4.9         3            1.4         0.2 setosa     14.7
 3          4.7         3.2          1.3         0.2 setosa     15.0
 4          4.6         3.1          1.5         0.2 setosa     14.3
 5          5           3.6          1.4         0.2 setosa     18  
 6          5.4         3.9          1.7         0.4 setosa     21.1
 7          4.6         3.4          1.4         0.3 setosa     15.6
 8          5           3.4          1.5         0.2 setosa     17  
 9          4.4         2.9          1.4         0.2 setosa     12.8
10          4.9         3.1          1.5         0.1 setosa     15.2
# ... with 140 more rows

Answer 2

我们可以将 col_names 变成一个包含 parse_expr 和 paste 的表达式，然后在 mutate:

中计算时取消引用

library(dplyr)
library(rlang)

programmatic_mutate_y <- function(df, col_names){
  mutate(df, product = !!parse_expr(paste(col_names, collapse = "*")))
}

输出：

> parse_expr(paste(c("Sepal.Width", "Sepal.Length"), collapse = "*"))
Sepal.Width * Sepal.Length

> programmatic_mutate_y(df_with_missing, c("Sepal.Width", "Sepal.Length"))
> Error: object 'Sepal.Width' not found 

> programmatic_mutate_y(iris, c("Sepal.Width", "Sepal.Length"))
# A tibble: 150 x 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species product
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
 1          5.1         3.5          1.4         0.2 setosa     17.8
 2          4.9         3            1.4         0.2 setosa     14.7
 3          4.7         3.2          1.3         0.2 setosa     15.0
 4          4.6         3.1          1.5         0.2 setosa     14.3
 5          5           3.6          1.4         0.2 setosa     18  
 6          5.4         3.9          1.7         0.4 setosa     21.1
 7          4.6         3.4          1.4         0.3 setosa     15.6
 8          5           3.4          1.5         0.2 setosa     17  
 9          4.4         2.9          1.4         0.2 setosa     12.8
10          4.9         3.1          1.5         0.1 setosa     15.2
# ... with 140 more rows

在以列标签作为字符向量的函数中使用 dplyr 动词

Using dplyr verbs in a function with column labels as character vectors

scope

r

dplyr

rlang