使用 dplyr 编程时如何使用括号符号(或替代符号)

how to use bracket notation (or an alternative) while programming with dplyr

我正在尝试编写一个函数来计算 toplines(通常用于投票数据)。它需要包含 "percent" 和 "valid percent" 列。

这是一个例子

library(tidyverse)
# prepare some data
d <- gss_cat %>%
  mutate(tvhours2 = tvhours,
         tvhours2 = replace(tvhours2, tvhours > 5 , "6-8"),
         tvhours2 = replace(tvhours2, tvhours > 8 , "9+"),
         tvhours2 = fct_explicit_na(tvhours2),
         # make a weight variable
         fakeweight = rnorm(n(), mean = 1, sd = .25))

以下函数就其本身而言有效:

make.topline <- function(variable, data, weight){
  variable <- enquo(variable)
  weight <- enquo(weight)

  table <- data %>%
    # calculate denominator
    mutate(total = sum(!!weight)) %>%
    # calculate proportions
    group_by(!!variable) %>%
    summarise(pct = (sum(!!weight)/first(total))*100,
              n = sum(!!weight))

  table
}
make.topline(variable = tvhours2, data = d, weight = fakeweight)

我正在努力实现有效的百分比列。这是我试过的语法。

make.topline2 <- function(variable, data, weight){
  variable <- enquo(variable)
  weight <- enquo(weight)

  table <- data %>%
    # calculate denominator
    mutate(total = sum(!!weight),
           valid.total = sum(!!weight[!!variable != "(Missing)"])) %>%
    # calculate proportions
    group_by(!!variable) %>%
    summarise(pct = (sum(!!weight)/first(total))*100,
              valid.pct = (sum(!!weight)/first(valid.total))*100,
              n = sum(!!weight))

  table
}

make.topline2(variable = tvhours2, data = d, weight = fakeweight)

这会产生以下错误:

 Error: Base operators are not defined for quosures.
Do you need to unquote the quosure?

  # Bad:
  myquosure != rhs

  # Good:
  !!myquosure != rhs
Call `rlang::last_error()` to see a backtrace 

我知道问题出在这一行,但我不知道如何解决:

mutate(valid.total = sum(!!weight[!!variable != "(Missing)"]))

您可以在 !!weight 两边加上括号。我认为这是为了确保我们仅使用提取括号 after weight 未加引号(因此是操作顺序)。

那一行看起来像:

valid.total = sum((!!weight)[!!variable != "(Missing)"])

或者,您可以使用新的 curly-curly 运算符 ({{),它代替 enquo()/!! 组合用于像您这样的相对简单的情况。那么你的函数看起来像

make.topline <- function(variable, data, weight){

    table <- data %>%
        # calculate denominator
        mutate(total = sum({{ weight }}),
               valid.total = sum({{ weight }}[{{ variable }} != "(Missing)"])) %>%
        # calculate proportions
        group_by({{ variable }}) %>%
        summarise(pct = (sum({{ weight }})/first(total))*100,
                  valid.pct = (sum({{ weight }})/first(valid.total))*100,
                  n = sum({{ weight }}))

    table
}

与括号中的解决方案一样,运行时没有错误。

make.topline(variable = tvhours2, data = d, weight = fakeweight)

# A tibble: 9 x 4
  tvhours2    pct valid.pct      n
  <fct>     <dbl>     <dbl>  <dbl>
1 0          3.16      5.98   679.
2 1         10.9      20.6   2342.
3 2         14.1      26.6   3022.
4 3          9.10     17.2   1957.
5 4          6.67     12.6   1432.
6 5          3.24      6.13   696.
7 6-8        4.02      7.61   864.
8 9+         1.67      3.16   358.
9 (Missing) 47.2      89.3  10140.