mutate_ deprecated，简单易懂的替代方案？

Question

我正在尝试创建一个创建变量的函数。像这样：

Add_Extreme_Variable <- function(dataframe, variable, variable_name){
    dataframe %>%
    group_by(cod_station, year_station) %>%
    mutate(variable_name= ifelse(variable > quantile(variable, 0.95, na.rm=TRUE),1,0)) %>%
    ungroup() %>%
    return()
}

df <- Add_Extreme_Variable (df, rain, extreme_rain)

df 是我正在使用的数据框，rain 是 df 中的一个数字变量，extreme_rain 是我想要的变量的名称创建。

如果我使用 mutate_() 一切正常，但问题是它已被弃用。但是，我在 Whosebug (1, , ) and the vignette 中找到的解决方案似乎不适合我的问题，或者它似乎比我需要的要复杂得多，因为我找不到关于如何使用 quo()、没有 space 的 !!、有 space 的 !!、如何用 := 替换 =，我不知道是否有效与他们一起将解决我遇到的问题，甚至是必要的，因为执行此功能的最终目标是使代码更清晰。有什么建议吗？

Answer 1

我们可以使用 rlangs curly curly ({{}}) 运算符和 enquo 来添加新列，并传递未加引号的输入。

library(dplyr)
library(rlang)

Add_Extreme_Variable <- function(dataframe, variable, variable_name){
   col_name <- enquo(variable_name)

   dataframe %>%
     group_by(cyl, am) %>%
     mutate(!!col_name := as.integer({{variable}} > 
                          quantile({{variable}}, 0.95, na.rm=TRUE))) %>%
     ungroup() 
}

Add_Extreme_Variable(mtcars, mpg, new)

# A tibble: 32 x 12
#     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   new
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     0
# 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     0
# 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     0
# 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     1
# 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     0
# 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     0
# 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4     0
# 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     1
# 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     0
#10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     0
# … with 22 more rows

Answer 2

您可以使用 {{ }}（Hadley Wickham 的 Advanced R 书中的 curly curly). See Tidy evaluation section。下面是使用 gapminder 数据集的示例。

library(gapminder)
library(rlang)
library(tidyverse)

Add_Extreme_Variable2 <- function(dataframe, group_by_var1, group_by_var2, variable, variable_name) {
  res <- dataframe %>%
    group_by({{group_by_var1}}, {{group_by_var2}}) %>%
    mutate({{variable_name}} := ifelse({{variable}} > quantile({{variable}}, 0.95, na.rm = TRUE), 1, 0)) %>%
    ungroup()
  return(res)
}

df <- Add_Extreme_Variable2(gapminder, continent, year, pop, pop_extreme) %>% 
  arrange(desc(pop_extreme))
df
#> # A tibble: 1,704 x 7
#>    country   continent  year lifeExp      pop gdpPercap pop_extreme
#>    <fct>     <fct>     <int>   <dbl>    <int>     <dbl>       <dbl>
#>  1 Australia Oceania    1952    69.1  8691212    10040.           1
#>  2 Australia Oceania    1957    70.3  9712569    10950.           1
#>  3 Australia Oceania    1962    70.9 10794968    12217.           1
#>  4 Australia Oceania    1967    71.1 11872264    14526.           1
#>  5 Australia Oceania    1972    71.9 13177000    16789.           1
#>  6 Australia Oceania    1977    73.5 14074100    18334.           1
#>  7 Australia Oceania    1982    74.7 15184200    19477.           1
#>  8 Australia Oceania    1987    76.3 16257249    21889.           1
#>  9 Australia Oceania    1992    77.6 17481977    23425.           1
#> 10 Australia Oceania    1997    78.8 18565243    26998.           1
#> # ... with 1,694 more rows

summary(df)
#>         country        continent        year         lifeExp     
#>  Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
#>  Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
#>  Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
#>  Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
#>  Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
#>  Australia  :  12                  Max.   :2007   Max.   :82.60  
#>  (Other)    :1632                                                
#>       pop              gdpPercap         pop_extreme     
#>  Min.   :6.001e+04   Min.   :   241.2   Min.   :0.00000  
#>  1st Qu.:2.794e+06   1st Qu.:  1202.1   1st Qu.:0.00000  
#>  Median :7.024e+06   Median :  3531.8   Median :0.00000  
#>  Mean   :2.960e+07   Mean   :  7215.3   Mean   :0.07042  
#>  3rd Qu.:1.959e+07   3rd Qu.:  9325.5   3rd Qu.:0.00000  
#>  Max.   :1.319e+09   Max.   :113523.1   Max.   :1.00000  
#>

^{由 reprex package (v0.3.0)}

于 2019-11-10 创建

mutate_ deprecated，简单易懂的替代方案？

mutate_ deprecated, easy and understandable alternatives?

r

dplyr

non-standard-evaluation

tidyeval