在函数内部使用 dplyr 时出错

Question

我正在尝试将一个函数放在一起，该函数从我的原始数据框创建一个子集，然后使用 dplyr 的 SELECT 和 MUTATE 给我 large/small 条目的数量，基于sepals/petals 的宽度和长度之和。

filter <- function (spp, LENGTH, WIDTH) {
  d <- subset (iris, subset=iris$Species == spp) # This part seems to work just fine
  large <- d %>%                       
    select (LENGTH, WIDTH) %>%   # This is where the problem arises.
    mutate (sum = LENGTH + WIDTH) 
  big_samples <- which(large$sum > 4)
 return (length(big_samples)) 
}

基本上，我想要的函数是return大花的数量。但是，当我运行该函数时，出现以下错误 -

filter("virginica", "Sepal.Length", "Sepal.Width")

 Error: All select() inputs must resolve to integer column positions.
The following do not:
*  LENGTH
*  WIDTH

我做错了什么？

Answer 1

您运行遇到 NSE/SE 个问题，请参阅 the vignette for more info。

简而言之，dplyr 使用名称的非标准评估 (NSE)，将列名称传递给函数会破坏它，而不使用标准评估 (SE) 版本。

dplyr 函数的 SE 版本以 _ 结尾。您可以看到 select_ 与您的原始参数配合得很好。

但是，使用函数时事情会变得更加复杂。我们可以使用 lazyeval::interp 将大多数函数参数转换为列名，请参阅下面的函数中 mutate 到 mutate_ 调用的转换以及更一般的帮助：?lazyeval::interp

尝试：

filter <- function (spp, LENGTH, WIDTH) {
    d <- subset (iris, subset=iris$Species == spp) 
    large <- d %>%                       
        select_(LENGTH, WIDTH) %>%  
        mutate_(sum = lazyeval::interp(~X + Y, X = as.name(LENGTH), Y = as.name(WIDTH))) 
    big_samples <- which(large$sum > 4)
    return (length(big_samples)) 
}

Answer 2

更新：从 dplyr 0.7.0 开始，您可以使用 tidy eval 来完成此操作。

有关详细信息，请参阅 http://dplyr.tidyverse.org/articles/programming.html。

filter_big <- function(spp, LENGTH, WIDTH) {
  LENGTH <- enquo(LENGTH)                    # Create quosure
  WIDTH  <- enquo(WIDTH)                     # Create quosure

  iris %>% 
    filter(Species == spp) %>% 
    select(!!LENGTH, !!WIDTH) %>%            # Use !! to unquote the quosure
    mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
    filter(sum > 4) %>% 
    nrow()
}

filter_big("virginica", Sepal.Length, Sepal.Width)

> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50

Answer 3

如果 quosure 和 quasiquotation 对您来说太多了，请使用 .data[[ ]] 或 rlang {{ }}（curly curly) instead. See Hadley Wickham's 5min video on tidy evaluation and (maybe) Tidy evaluation section 在 Hadley 的 Advanced R 书中获取更多信息。

library(rlang)
library(dplyr)

filter_data <- function(df, spp, LENGTH, WIDTH) {
  res <- df %>% 
    filter(Species == spp) %>% 
    select(.data[[LENGTH]], .data[[WIDTH]]) %>%        
    mutate(sum = .data[[LENGTH]] + .data[[WIDTH]]) %>% 
    filter(sum > 4) %>% 
    nrow()
  return(res)
}

filter_data(iris, "virginica", "Sepal.Length", "Sepal.Width")
#> [1] 50


filter_rlang <- function(df, spp, LENGTH, WIDTH) {
  res <- df %>% 
    filter(Species == spp) %>% 
    select({{LENGTH}}, {{WIDTH}}) %>%        
    mutate(sum = {{LENGTH}} + {{WIDTH}}) %>% 
    filter(sum > 4) %>% 
    nrow()
  return(res)
}

filter_rlang(iris, "virginica", Sepal.Length, Sepal.Width)
#> [1] 50

^{由 reprex package (v0.3.0)}

于 2019-11-10 创建

在函数内部使用 dplyr 时出错

Error when using dplyr inside of a function

r

function

dplyr

tidyeval