我怎样才能编写一个 tidyverse 友好的函数,在管道中更早地尊重 group_by() ?
How can I write a tidyverse-friendly function that respects group_by() earlier in the pipe?
我已经开始着手编写函数以加快 table 生成速度,但希望该函数尊重用户在管道中所做的早期分组选择。
示例数据:
df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))
以下是我正在处理的函数,它 requires/accepts 一个定义群组的变量、一个信用变量和一个期限变量。
table_headsfte_cohorts<-function(.data,cohortvar,credits,term){
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,Pidm)%>%
group_by(!!term,!!cohortvar,group_cols())%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,group_cols())
}
对于可能有兴趣使用附加分组变量的用户,我希望最终结果函数允许使用如下:
df2<-df%>%
group_by(Status,Group)%>%
table_headsfte_cohorts(Major,Credits,Year)
期望的最终结果将是 table,除了 cohortvar
和 [=20] 之外,尊重并保留上述 group_by
语句中两个分组变量的水平=] 列来自 table_headsfte_cohorts()
个参数。
我需要生成相同的 table,但对于范围广泛的分组变量和不同数量的分组变量,灵活性将非常有帮助。
编辑:
以下似乎接近了,至少允许多个分组变量。这不是我所希望的,因为我更喜欢从管道中读取额外的分组参数:
table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){
grps<-enquos(...)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,!!cohortvar,!!! grps)%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,!!!grps)
}
利用上面的,我可以成功运行:
fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:
我还可以将其他变量传递给函数作为附加分组变量:
fdfout_alt<-fdf%>%
table_headsfte_cohorts(Major,Credits,Year,Status,Group)
产生了想要的结果:
不幸的是,当我使用
fdf_no<-fdf%>%
group_by(Status, Group)%>%
table_headsfte_cohorts(Major, Credits, Year)
我得到:
此输出可能会使使用我的函数的人感到困惑,因为他们的 group_by()
行似乎什么都不做。
我添加了一些行,将点内的现有分组变量和新分组变量合并到一个字符向量中。我们可以用 group_vars
得到现有的分组变量。要将新旧合并在一起,我们必须获取引用分组变量的表达式 get_expr
并将它们转换为字符串。我们可以使用 !!! syms
来评估和 all_of
到 select 分组变量。
这是你的想法吗?
table_headsfte_cohorts <- function(.data, cohortvar, credits, term, ...){
new_grps <- enquos(...)
new_grps <- purrr::map_chr(new_grps, ~ as.character(rlang::get_expr(.x)))
ex_grps <- group_vars(.data)
grp_vars <- c(ex_grps, new_grps)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!! term,
!! cohortvar,
!!! syms(grp_vars))%>%
mutate(on3 = 1) %>%
mutate(`Headcount`= sum(on3),
`FTE`= round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE, all_of(grp_vars))
}
df %>%
group_by(Status, Group) %>%
table_headsfte_cohorts(Major, Credits, Year)
#> Adding missing grouping variables: `Major`
#> Adding missing grouping variables: `Year`, `Major`
#> # A tibble: 12 x 8
#> # Groups: Year, Major, Status, Group [12]
#> Year Major Variable Category Headcount FTE Status Group
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 1 GS Major GS 1 0.1 green Art
#> 2 1 GS Major GS 1 0.2 blue Music
#> 3 1 LA Major LA 1 0.3 green Science
#> 4 2 GS Major GS 1 0.3 blue Art
#> 5 2 GS Major GS 1 0.4 green Music
#> 6 2 LA Major LA 1 0.5 blue Science
#> 7 3 GS Major GS 1 0.1 green Art
#> 8 3 LA Major LA 1 0.1 blue Music
#> 9 3 LA Major LA 1 0.1 green Science
#> 10 4 GS Major GS 1 0.4 blue Art
#> 11 4 LA Major LA 1 0.1 green Music
#> 12 4 LA Major LA 1 0.1 blue Science
我已经开始着手编写函数以加快 table 生成速度,但希望该函数尊重用户在管道中所做的早期分组选择。
示例数据:
df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))
以下是我正在处理的函数,它 requires/accepts 一个定义群组的变量、一个信用变量和一个期限变量。
table_headsfte_cohorts<-function(.data,cohortvar,credits,term){
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,Pidm)%>%
group_by(!!term,!!cohortvar,group_cols())%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,group_cols())
}
对于可能有兴趣使用附加分组变量的用户,我希望最终结果函数允许使用如下:
df2<-df%>%
group_by(Status,Group)%>%
table_headsfte_cohorts(Major,Credits,Year)
期望的最终结果将是 table,除了 cohortvar
和 [=20] 之外,尊重并保留上述 group_by
语句中两个分组变量的水平=] 列来自 table_headsfte_cohorts()
个参数。
我需要生成相同的 table,但对于范围广泛的分组变量和不同数量的分组变量,灵活性将非常有帮助。
编辑:
以下似乎接近了,至少允许多个分组变量。这不是我所希望的,因为我更喜欢从管道中读取额外的分组参数:
table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){
grps<-enquos(...)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,!!cohortvar,!!! grps)%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,!!!grps)
}
利用上面的,我可以成功运行:
fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:
我还可以将其他变量传递给函数作为附加分组变量:
fdfout_alt<-fdf%>%
table_headsfte_cohorts(Major,Credits,Year,Status,Group)
产生了想要的结果:
不幸的是,当我使用
fdf_no<-fdf%>%
group_by(Status, Group)%>%
table_headsfte_cohorts(Major, Credits, Year)
我得到:
此输出可能会使使用我的函数的人感到困惑,因为他们的 group_by()
行似乎什么都不做。
我添加了一些行,将点内的现有分组变量和新分组变量合并到一个字符向量中。我们可以用 group_vars
得到现有的分组变量。要将新旧合并在一起,我们必须获取引用分组变量的表达式 get_expr
并将它们转换为字符串。我们可以使用 !!! syms
来评估和 all_of
到 select 分组变量。
这是你的想法吗?
table_headsfte_cohorts <- function(.data, cohortvar, credits, term, ...){
new_grps <- enquos(...)
new_grps <- purrr::map_chr(new_grps, ~ as.character(rlang::get_expr(.x)))
ex_grps <- group_vars(.data)
grp_vars <- c(ex_grps, new_grps)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!! term,
!! cohortvar,
!!! syms(grp_vars))%>%
mutate(on3 = 1) %>%
mutate(`Headcount`= sum(on3),
`FTE`= round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE, all_of(grp_vars))
}
df %>%
group_by(Status, Group) %>%
table_headsfte_cohorts(Major, Credits, Year)
#> Adding missing grouping variables: `Major`
#> Adding missing grouping variables: `Year`, `Major`
#> # A tibble: 12 x 8
#> # Groups: Year, Major, Status, Group [12]
#> Year Major Variable Category Headcount FTE Status Group
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 1 GS Major GS 1 0.1 green Art
#> 2 1 GS Major GS 1 0.2 blue Music
#> 3 1 LA Major LA 1 0.3 green Science
#> 4 2 GS Major GS 1 0.3 blue Art
#> 5 2 GS Major GS 1 0.4 green Music
#> 6 2 LA Major LA 1 0.5 blue Science
#> 7 3 GS Major GS 1 0.1 green Art
#> 8 3 LA Major LA 1 0.1 blue Music
#> 9 3 LA Major LA 1 0.1 green Science
#> 10 4 GS Major GS 1 0.4 blue Art
#> 11 4 LA Major LA 1 0.1 green Music
#> 12 4 LA Major LA 1 0.1 blue Science