如何将变量引用和取消引用到函数中并迭代数据框
How to quote and unquote a variable into a function and iterate over a dataframe
我正在尝试获取一个函数并迭代一个值数据框。这里的目标是以 10 人为一组总结机场延误。
如何将传递给函数的值作为名称? column origin(EWR,LGA,JFK)应该保存为column,还是需要通过function传入group。
library(tidyverse)
library(nycflights13)
head(flights)
#> # A tibble: 6 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> # ... with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> # hour <dbl>, minute <dbl>, time_hour <dttm>
ntile_summary <- function(data, by, var) {
by <- enquo(by)
var <- enquo(var)
data %>%
mutate(pcts = ntile(!!by, n = 10),
col_nm = !!by)
group_by(pcts, col_nm) %>%
summarize(avg = mean(!!var, na.ram = TRUE))
}
params <- expand_grid(
flights %>% count(origin) %>% select(origin),
flights %>% count(day) %>% head(2) %>% select(day)
)
ntile_summary(flights, day, arr_delay)
#> Error in group_by(pcts, col_nm): object 'pcts' not found
purrr::walk(params, ~ntile_summary(flights, !origin, arr_delay))
#> Error in !origin: invalid argument type
由 reprex package (v0.3.0)
于 2020 年 3 月 15 日创建
在mutate
之后,连接是。不存在 %>%
ntile_summary <- function(data, by, var) {
by <- enquo(by)
var <- enquo(var)
data %>%
mutate(pcts = ntile(!!by, n = 10),
col_nm = !!by) %>%
group_by(pcts, col_nm) %>%
summarize(avg = mean(!!var, na.ram = TRUE))
}
ntile_summary(flights, day, arr_delay)
# A tibble: 40 x 3
# Groups: pcts [10]
# pcts col_nm avg
# <int> <int> <dbl>
# 1 1 1 NA
# 2 1 2 NA
# 3 1 3 NA
# 4 1 4 -4.44
# 5 2 4 NA
# 6 2 5 NA
# 7 2 6 NA
# 8 2 7 NA
# 9 3 7 NA
#10 3 8 NA
# … with 30 more rows
我们也可以使用卷曲运算符 ({{}}
) 而不是 enquo
+ `!!~
ntile_summary <- function(data, by, var) {
data %>%
mutate(col_nm = {{by}}, pcts = ntile({{by}}, n = 10)) %>%
group_by(pcts, col_nm) %>%
summarize(avg = mean({{var}}, na.ram = TRUE))
}
ntile_summary(flights, day, arr_delay)
# A tibble: 40 x 3
# Groups: pcts [10]
# pcts col_nm avg
# <int> <int> <dbl>
# 1 1 1 NA
# 2 1 2 NA
# 3 1 3 NA
# 4 1 4 -4.44
# 5 2 4 NA
# 6 2 5 NA
# 7 2 6 NA
# 8 2 7 NA
# 9 3 7 NA
#10 3 8 NA
# … with 30 more rows
我正在尝试获取一个函数并迭代一个值数据框。这里的目标是以 10 人为一组总结机场延误。
如何将传递给函数的值作为名称? column origin(EWR,LGA,JFK)应该保存为column,还是需要通过function传入group。
library(tidyverse)
library(nycflights13)
head(flights)
#> # A tibble: 6 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> # ... with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#> # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> # hour <dbl>, minute <dbl>, time_hour <dttm>
ntile_summary <- function(data, by, var) {
by <- enquo(by)
var <- enquo(var)
data %>%
mutate(pcts = ntile(!!by, n = 10),
col_nm = !!by)
group_by(pcts, col_nm) %>%
summarize(avg = mean(!!var, na.ram = TRUE))
}
params <- expand_grid(
flights %>% count(origin) %>% select(origin),
flights %>% count(day) %>% head(2) %>% select(day)
)
ntile_summary(flights, day, arr_delay)
#> Error in group_by(pcts, col_nm): object 'pcts' not found
purrr::walk(params, ~ntile_summary(flights, !origin, arr_delay))
#> Error in !origin: invalid argument type
由 reprex package (v0.3.0)
于 2020 年 3 月 15 日创建在mutate
之后,连接是。不存在 %>%
ntile_summary <- function(data, by, var) {
by <- enquo(by)
var <- enquo(var)
data %>%
mutate(pcts = ntile(!!by, n = 10),
col_nm = !!by) %>%
group_by(pcts, col_nm) %>%
summarize(avg = mean(!!var, na.ram = TRUE))
}
ntile_summary(flights, day, arr_delay)
# A tibble: 40 x 3
# Groups: pcts [10]
# pcts col_nm avg
# <int> <int> <dbl>
# 1 1 1 NA
# 2 1 2 NA
# 3 1 3 NA
# 4 1 4 -4.44
# 5 2 4 NA
# 6 2 5 NA
# 7 2 6 NA
# 8 2 7 NA
# 9 3 7 NA
#10 3 8 NA
# … with 30 more rows
我们也可以使用卷曲运算符 ({{}}
) 而不是 enquo
+ `!!~
ntile_summary <- function(data, by, var) {
data %>%
mutate(col_nm = {{by}}, pcts = ntile({{by}}, n = 10)) %>%
group_by(pcts, col_nm) %>%
summarize(avg = mean({{var}}, na.ram = TRUE))
}
ntile_summary(flights, day, arr_delay)
# A tibble: 40 x 3
# Groups: pcts [10]
# pcts col_nm avg
# <int> <int> <dbl>
# 1 1 1 NA
# 2 1 2 NA
# 3 1 3 NA
# 4 1 4 -4.44
# 5 2 4 NA
# 6 2 5 NA
# 7 2 6 NA
# 8 2 7 NA
# 9 3 7 NA
#10 3 8 NA
# … with 30 more rows