mutate_ deprecated,简单易懂的替代方案?
mutate_ deprecated, easy and understandable alternatives?
我正在尝试创建一个创建变量的函数。像这样:
Add_Extreme_Variable <- function(dataframe, variable, variable_name){
dataframe %>%
group_by(cod_station, year_station) %>%
mutate(variable_name= ifelse(variable > quantile(variable, 0.95, na.rm=TRUE),1,0)) %>%
ungroup() %>%
return()
}
df <- Add_Extreme_Variable (df, rain, extreme_rain)
df
是我正在使用的数据框,rain
是 df
中的一个数字变量,extreme_rain
是我想要的变量的名称创建。
如果我使用 mutate_()
一切正常,但问题是它已被弃用。但是,我在 Whosebug (1, , ) and the vignette 中找到的解决方案似乎不适合我的问题,或者它似乎比我需要的要复杂得多,因为我找不到关于如何使用 quo()
、没有 space 的 !!
、有 space 的 !!
、如何用 :=
替换 =
,我不知道是否有效与他们一起将解决我遇到的问题,甚至是必要的,因为执行此功能的最终目标是使代码更清晰。有什么建议吗?
我们可以使用 rlang
s curly curly ({{}}
) 运算符和 enquo
来添加新列,并传递未加引号的输入。
library(dplyr)
library(rlang)
Add_Extreme_Variable <- function(dataframe, variable, variable_name){
col_name <- enquo(variable_name)
dataframe %>%
group_by(cyl, am) %>%
mutate(!!col_name := as.integer({{variable}} >
quantile({{variable}}, 0.95, na.rm=TRUE))) %>%
ungroup()
}
Add_Extreme_Variable(mtcars, mpg, new)
# A tibble: 32 x 12
# mpg cyl disp hp drat wt qsec vs am gear carb new
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 0
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 0
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 0
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 1
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 0
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 0
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 0
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 1
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 0
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 0
# … with 22 more rows
您可以使用 {{ }}
(Hadley Wickham 的 Advanced R 书中的 curly curly). See Tidy evaluation section。下面是使用 gapminder
数据集的示例。
library(gapminder)
library(rlang)
library(tidyverse)
Add_Extreme_Variable2 <- function(dataframe, group_by_var1, group_by_var2, variable, variable_name) {
res <- dataframe %>%
group_by({{group_by_var1}}, {{group_by_var2}}) %>%
mutate({{variable_name}} := ifelse({{variable}} > quantile({{variable}}, 0.95, na.rm = TRUE), 1, 0)) %>%
ungroup()
return(res)
}
df <- Add_Extreme_Variable2(gapminder, continent, year, pop, pop_extreme) %>%
arrange(desc(pop_extreme))
df
#> # A tibble: 1,704 x 7
#> country continent year lifeExp pop gdpPercap pop_extreme
#> <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
#> 1 Australia Oceania 1952 69.1 8691212 10040. 1
#> 2 Australia Oceania 1957 70.3 9712569 10950. 1
#> 3 Australia Oceania 1962 70.9 10794968 12217. 1
#> 4 Australia Oceania 1967 71.1 11872264 14526. 1
#> 5 Australia Oceania 1972 71.9 13177000 16789. 1
#> 6 Australia Oceania 1977 73.5 14074100 18334. 1
#> 7 Australia Oceania 1982 74.7 15184200 19477. 1
#> 8 Australia Oceania 1987 76.3 16257249 21889. 1
#> 9 Australia Oceania 1992 77.6 17481977 23425. 1
#> 10 Australia Oceania 1997 78.8 18565243 26998. 1
#> # ... with 1,694 more rows
summary(df)
#> country continent year lifeExp
#> Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
#> Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
#> Algeria : 12 Asia :396 Median :1980 Median :60.71
#> Angola : 12 Europe :360 Mean :1980 Mean :59.47
#> Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
#> Australia : 12 Max. :2007 Max. :82.60
#> (Other) :1632
#> pop gdpPercap pop_extreme
#> Min. :6.001e+04 Min. : 241.2 Min. :0.00000
#> 1st Qu.:2.794e+06 1st Qu.: 1202.1 1st Qu.:0.00000
#> Median :7.024e+06 Median : 3531.8 Median :0.00000
#> Mean :2.960e+07 Mean : 7215.3 Mean :0.07042
#> 3rd Qu.:1.959e+07 3rd Qu.: 9325.5 3rd Qu.:0.00000
#> Max. :1.319e+09 Max. :113523.1 Max. :1.00000
#>
由 reprex package (v0.3.0)
于 2019-11-10 创建
我正在尝试创建一个创建变量的函数。像这样:
Add_Extreme_Variable <- function(dataframe, variable, variable_name){
dataframe %>%
group_by(cod_station, year_station) %>%
mutate(variable_name= ifelse(variable > quantile(variable, 0.95, na.rm=TRUE),1,0)) %>%
ungroup() %>%
return()
}
df <- Add_Extreme_Variable (df, rain, extreme_rain)
df
是我正在使用的数据框,rain
是 df
中的一个数字变量,extreme_rain
是我想要的变量的名称创建。
如果我使用 mutate_()
一切正常,但问题是它已被弃用。但是,我在 Whosebug (1, quo()
、没有 space 的 !!
、有 space 的 !!
、如何用 :=
替换 =
,我不知道是否有效与他们一起将解决我遇到的问题,甚至是必要的,因为执行此功能的最终目标是使代码更清晰。有什么建议吗?
我们可以使用 rlang
s curly curly ({{}}
) 运算符和 enquo
来添加新列,并传递未加引号的输入。
library(dplyr)
library(rlang)
Add_Extreme_Variable <- function(dataframe, variable, variable_name){
col_name <- enquo(variable_name)
dataframe %>%
group_by(cyl, am) %>%
mutate(!!col_name := as.integer({{variable}} >
quantile({{variable}}, 0.95, na.rm=TRUE))) %>%
ungroup()
}
Add_Extreme_Variable(mtcars, mpg, new)
# A tibble: 32 x 12
# mpg cyl disp hp drat wt qsec vs am gear carb new
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 0
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 0
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 0
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 1
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 0
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 0
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 0
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 1
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 0
#10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 0
# … with 22 more rows
您可以使用 {{ }}
(Hadley Wickham 的 Advanced R 书中的 curly curly). See Tidy evaluation section。下面是使用 gapminder
数据集的示例。
library(gapminder)
library(rlang)
library(tidyverse)
Add_Extreme_Variable2 <- function(dataframe, group_by_var1, group_by_var2, variable, variable_name) {
res <- dataframe %>%
group_by({{group_by_var1}}, {{group_by_var2}}) %>%
mutate({{variable_name}} := ifelse({{variable}} > quantile({{variable}}, 0.95, na.rm = TRUE), 1, 0)) %>%
ungroup()
return(res)
}
df <- Add_Extreme_Variable2(gapminder, continent, year, pop, pop_extreme) %>%
arrange(desc(pop_extreme))
df
#> # A tibble: 1,704 x 7
#> country continent year lifeExp pop gdpPercap pop_extreme
#> <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
#> 1 Australia Oceania 1952 69.1 8691212 10040. 1
#> 2 Australia Oceania 1957 70.3 9712569 10950. 1
#> 3 Australia Oceania 1962 70.9 10794968 12217. 1
#> 4 Australia Oceania 1967 71.1 11872264 14526. 1
#> 5 Australia Oceania 1972 71.9 13177000 16789. 1
#> 6 Australia Oceania 1977 73.5 14074100 18334. 1
#> 7 Australia Oceania 1982 74.7 15184200 19477. 1
#> 8 Australia Oceania 1987 76.3 16257249 21889. 1
#> 9 Australia Oceania 1992 77.6 17481977 23425. 1
#> 10 Australia Oceania 1997 78.8 18565243 26998. 1
#> # ... with 1,694 more rows
summary(df)
#> country continent year lifeExp
#> Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
#> Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
#> Algeria : 12 Asia :396 Median :1980 Median :60.71
#> Angola : 12 Europe :360 Mean :1980 Mean :59.47
#> Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
#> Australia : 12 Max. :2007 Max. :82.60
#> (Other) :1632
#> pop gdpPercap pop_extreme
#> Min. :6.001e+04 Min. : 241.2 Min. :0.00000
#> 1st Qu.:2.794e+06 1st Qu.: 1202.1 1st Qu.:0.00000
#> Median :7.024e+06 Median : 3531.8 Median :0.00000
#> Mean :2.960e+07 Mean : 7215.3 Mean :0.07042
#> 3rd Qu.:1.959e+07 3rd Qu.: 9325.5 3rd Qu.:0.00000
#> Max. :1.319e+09 Max. :113523.1 Max. :1.00000
#>
由 reprex package (v0.3.0)
于 2019-11-10 创建