使用 NSE（在 dplyr 中）时出错：找不到对象 'value'

Question

我正在尝试熟悉在我的代码中使用 NSE 的必要条件。假设我有几对列，想为每一对生成一个新的字符串变量，指示该对中的值是否相同。

library(tidyverse)
library(magrittr)

df <- tibble(one.x = c(1,2,3,4),
             one.y = c(2,2,4,3),
             two.x = c(5,6,7,8),
             two.y = c(6,7,7,9),
             # not used but also in df
             extra = c(5,5,5,5))

我正在尝试编写可以完成与以下代码相同的事情的代码：

df.mod <- df %>%
  # is one.x the same as one.y?
  mutate(one.x_suffix = case_when( 
    one.x == one.y ~ "same",
    TRUE ~ "different")) %>%
  # is two.x the same as two.y?
  mutate(two.x_suffix = case_when(
    two.x == two.y ~ "same",
    TRUE ~ "different"))

df.mod
#> # A tibble: 4 x 6
#>   one.x one.y two.x two.y one.x_suffix two.x_suffix
#>   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#> 1    1.    2.    5.    6. different    different   
#> 2    2.    2.    6.    7. same         different   
#> 3    3.    4.    7.    7. different    same        
#> 4    4.    3.    8.    9. different    different

在我的实际数据中，我有任意数量的此类对（例如 three.x 和 three.y，......）所以我想使用 mutate_at 编写一个更通用的过程.

我的策略是在等式测试的一侧将“.x”变量作为 .vars 传递，然后 gsub 对于 "y" "x"在 case_when 中，像这样：

df.mod <- df %>%
  mutate_at(vars(one.x, two.x),
            funs(suffix = case_when(
              . == !!sym(gsub("x", "y", deparse(substitute(.)))) ~ "same",
              TRUE ~ "different")))
#> Error in mutate_impl(.data, dots): Evaluation error: object 'value' not found.

这是我遇到异常的时候。看起来 gsub 部分工作正常：

df.debug <- df %>%
  mutate_at(vars(one.x, two.x),
            funs(suffix = gsub("x", "y", deparse(substitute(.)))))
df.debug
#> # A tibble: 4 x 6
#>   one.x one.y two.x two.y one.x_suffix two.x_suffix
#>   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#> 1    1.    2.    5.    6. one.y        two.y       
#> 2    2.    2.    6.    7. one.y        two.y       
#> 3    3.    4.    7.    7. one.y        two.y       
#> 4    4.    3.    8.    9. one.y        two.y

这里是 !!sym() 操作导致了异常。我做错了什么？

^{由 reprex package (v0.2.1)}

创建于 2018-11-07

Answer 1

这里有一个 map 的选项。我们 split 将数据集分成 'x'、'y' 列和列名子串，然后循环 list 数据集 map、transmute 通过比较每个数据集的行来创建新的 'suffix' 列，将数据集的 list 绑定到单个数据集并与原始数据集绑定 (bind_cols)

library(tidyverse)
df %>% 
    select(matches("\.x|\.y")) %>%
    split.default(str_remove(names(.), "\..*")) %>%
    map( ~ .x %>%
                 transmute(!! paste0(names(.)[1], "_suffix") := 
                      reduce(., ~ c("different", "same")[(.x == .y) + 1]))) %>%
    bind_cols %>%
    bind_cols(df, .)
# A tibble: 4 x 7
#  one.x one.y two.x two.y extra one.x_suffix two.x_suffix
#   <dbl> <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
#1     1     2     5     6     5 different    different   
#2     2     2     6     7     5 same         different   
#3     3     4     7     7     5 different    same        
#4     4     3     8     9     5 different    different

或者另一种选择是创建一个表达式然后解析它

library(rlang)
expr1 <- paste(grep("\.x", names(df), value = TRUE), 
      grep("\.y", names(df), value = TRUE), sep="==", collapse=";")
df %>% 
    mutate(!!!rlang::parse_exprs(expr1)) %>%
    rename_at(vars(matches("==")), ~ paste0(str_remove(.x, "\s.*"), "_suffix"))
# A tibble: 4 x 7
#  one.x one.y two.x two.y extra one.x_suffix two.x_suffix
#  <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>        <lgl>       
#1     1     2     5     6     5 FALSE        FALSE       
#2     2     2     6     7     5 TRUE         FALSE       
#3     3     4     7     7     5 FALSE        TRUE        
#4     4     3     8     9     5 FALSE        FALSE

注意：它可以像第一个解决方案一样转换为 'same/different'。但是，最好将其保留为逻辑列

Answer 2

问题不在!!sym，如下例所示：

df %>% mutate_at( vars(one.x, two.x),
                  funs(suffix = case_when(
                    . == !!sym("one.y") ~ "same",
                    TRUE ~ "different")))
# # A tibble: 4 x 6
#   one.x one.y two.x two.y one.x_suffix two.x_suffix
#   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
# 1     1     2     5     6 different    different   
# 2     2     2     6     7 same         different   
# 3     3     4     7     7 different    different   
# 4     4     3     8     9 different    different

问题在于试图在 case_when:

中取消对 substitute(.) 的引用

df %>% mutate_at( vars(one.x, two.x),
                  funs(suffix = case_when(
                    . == !!substitute(.) ~ "same",
                    TRUE ~ "different")))
# Error in mutate_impl(.data, dots) : 
#   Evaluation error: object 'value' not found.

原因是运算符优先级。来自 !! 的帮助页面：

The !! operator unquotes its argument. It gets evaluated immediately in the surrounding context.

在上面的示例中，!!substitute(.) 的上下文是公式，它本身位于 case_when 中。这导致表达式立即被 value 替换，它是在 case_when 中定义的，在您的数据框中没有任何意义。

您想将表达式放在它们的环境旁边，这就是 quosures 的目的。通过将 substitute 替换为 rlang::enquo，您捕获了产生 . 的表达式及其定义环境（您的数据框）。为了保持整洁，让我们将您的 gsub 操作移动到一个单独的函数中：

x2y <- function(.x)
{
  ## Capture the expression and its environment
  qq <- enquo(.x)

  ## Retrieve the expression and deparse it
  txt <- rlang::get_expr(qq) %>% rlang::expr_deparse()

  ## Replace x with y, as before
  txty <- gsub("x", "y", txt)

  ## Put the new expression back into the quosure
  rlang::set_expr( qq, sym(txty) )
}

您现在可以直接在代码中使用新的 x2y 函数。使用 quosures，不需要取消引用，因为表达式已经携带了它们的环境；您可以使用 rlang::eval_tidy:

简单地评估它们

df %>% mutate_at(vars(one.x, two.x),
                 funs(suffix = case_when(
                   . == rlang::eval_tidy(x2y(.)) ~ "same",
                   TRUE ~ "different" )))
# # A tibble: 4 x 6
#   one.x one.y two.x two.y one.x_suffix two.x_suffix
#   <dbl> <dbl> <dbl> <dbl> <chr>        <chr>       
# 1     1     2     5     6 different    different   
# 2     2     2     6     7 same         different   
# 3     3     4     7     7 different    same        
# 4     4     3     8     9 different    different

编辑以解决您评论中的问题：将所有代码集中在一行中几乎总是一个坏主意™，我强烈建议不要这样做。但是，由于这个问题是关于 NSE 的，我认为理解为什么简单地获取 x2y 的内容并将其粘贴到 case_when 中会导致问题很重要。

enquo()，与 substitute() 一样，查看函数的调用环境，并将参数替换为提供给该函数的表达式。 substitute() 只会向上移动一个环境（在 case_when 中找到 value 当你取消引用它时），而 enquo() 只要调用堆栈中的函数正确处理 enquo() 就会继续向上移动 quasiquotation。（大多数 dplyr/tidyverse 函数都这样做。）因此，当您在 x2y 中调用 enquo(.x) 时，它会向上移动提供给调用堆栈中每个函数的表达式，最终找到 one.x .

当您在 mutate_at 中调用 enquo() 时，它现在与 one.x 处于同一级别，因此它也替换了参数（在本例中为 one.x）使用定义它的表达式（在本例中为向量 c(1,2,3,4)）。这不是你想要的。您现在不想提高级别，而是希望保持与 one.x 相同的级别。为此，请使用 rlang::quo() 代替 rlang::enquo():

library( rlang )   ## To maintain at least a little bit of sanity

df %>% 
 mutate_at(vars(one.x, two.x),
   funs(suffix = case_when(
    . == eval_tidy(set_expr(quo(.), 
                            sym(gsub("x","y", expr_deparse(get_expr(quo(.)))))
                       )
            ) ~ "same",
    TRUE ~ "different" )))
# Now works as expected

使用 NSE（在 dplyr 中）时出错：找不到对象 'value'

error when using NSE (in dplyr) : object 'value' not found

r

dplyr

nse