dplyr() 中的非标准评估和准引用未按（天真地）预期工作

Question

我正在尝试搜索数据库，然后使用源自原始搜索的名称标记输出，"derived_name" 在下面的可重现示例中。我正在使用 dplyr 管道 %>%，并且我遇到了准引用 and/or 非标准评估的问题。具体来说，在最终的 top_n() 函数中使用 count_colname，派生自 "derived_name" 的字符对象无法对数据帧进行子集化。

search_name <- "derived_name"
set.seed(1)
letrs <- letters[rnorm(52, 13.5, 5)]
letrs_count.df <- letrs %>%
    table() %>%
    as.data.frame()
count_colname <- paste0(search_name, "_letr_count")
colnames(letrs_count.df) <- c("letr", count_colname)
letrs_top.df <- letrs_count.df %>%
    top_n(5, count_colname)
identical(letrs_top.df, letrs_count.df)
# [1] TRUE

基于 I thought the code above would work. And 引导我尝试top_n_()，似乎不存在

我正在学习 vignette("programming") 这有点让我头疼。让我尝试了 !! sym() 语法，它有效，但我不知道为什么！帮助理解为什么下面的代码有效将不胜感激。谢谢

colnames(letrs_count.df) <- c("letr", count_colname)
letrs_top.df <- letrs_count.df %>%
    top_n(5, (!! sym(count_colname)))
letrs_top.df
#   letr derived_name_letr_count
# 1    l                       5
# 2    m                       6
# 3    o                       7
# 4    p                       5
# 5    q                       6

基于@lionel 和@Tung 的以下问题和评论的其他令人困惑的示例。在这里让我感到困惑的是帮助文件说 sym() "take strings as input and turn them into symbols" 和 !! "unquotes its argument"。但是，在下面的示例中，sym(count_colname) 似乎未引用 derived_name_letr_count。我不明白为什么 !! sym(count_colname) 中需要 !!，因为 sym(count_colname) 和 qq_show(!! sym(count_colname)) 给出相同的值。

count_colname
# [1] "derived_name_letr_count"
sym(count_colname)
# derived_name_letr_count
qq_show(count_colname)
# count_colname
qq_show(sym(count_colname))
# sym(count_colname)
qq_show(!! sym(count_colname))
# derived_name_letr_count
qq_show(!! count_colname)
# "derived_name_letr_count"

Answer 1

根据 top_n 文档 (?top_n)，它不支持 character/string 输入，因此第一个示例无效。在您的第二个示例中，rlang::sym 将字符串转换为变量名称，然后 !! 取消引用它，以便可以在 top_n 中对其进行评估。注意：top_n and other dplyr verbs 自动引用他们的输入。

按照@lionel 的建议使用rlang::qq_show，我们可以看到它不起作用，因为letrs_count.df

中没有count_colname 列

library(tidyverse)

set.seed(1)
letrs <- letters[rnorm(52, 13.5, 5)]
letrs_count.df <- letrs %>%
  table() %>%
  as.data.frame()

search_name <- "derived_name"
count_colname <- paste0(search_name, "_letr_count")
colnames(letrs_count.df) <- c("letr", count_colname)
letrs_count.df
#>    letr derived_name_letr_count
#> 1     b                       1
#> 2     c                       1
#> 3     f                       2
...

rlang::qq_show(top_n(letrs_count.df, 5, count_colname))
#> top_n(letrs_count.df, 5, count_colname)

sym & !! 创建存在于 letrs_count.df

中的正确列名

rlang::qq_show(top_n(letrs_count.df, 5, !! sym(count_colname)))
#> top_n(letrs_count.df, 5, derived_name_letr_count)

letrs_count.df %>%
  top_n(5, !! sym(count_colname))
#>   letr derived_name_letr_count
#> 1    l                       5
#> 2    m                       6
#> 3    o                       7
#> 4    p                       5
#> 5    q                       6

top_n(x, n, wt)

参数：

x: 一个tbl()来过滤
n：行数到return。如果 x 被分组，这是每组的行数。如果有联系，将包括超过 n 行。如果 n 为正，则选择顶部 n 行。如果为负，则选择底部 n 行。
wt：（可选）。用于排序的变量。如果未指定，则默认为 tbl 中的最后一个变量。此参数会自动引用并稍后在数据框的上下文中进行评估。它支持取消引用。有关这些概念的介绍，请参阅 vignette("programming")。

另请参阅这些答案：, ,

Answer 2

所以，我意识到我在这个问题（以及许多其他问题）中苦苦挣扎的并不是真正的准引用 and/or 非标准评估，而是 converting character strings into object names。这是我的新解决方案：

letrs_top.df <- letrs_count.df %>%
    top_n(5, get(count_colname))

dplyr() 中的非标准评估和准引用未按（天真地）预期工作

Non-standard evaluation and quasiquotation in dplyr() not working as (naively) expected

r

dplyr

non-standard-evaluation

tidyeval

quasiquotes