通过传递列参数创建 dplyr 函数
Creating dplyr function with passing column argument
我正在尝试将列名作为参数传递给在其中使用 dplyr 函数的函数。
围绕这个主题已经提出了多个问题,我尝试了所有这些问题,似乎每件事都会抛出一些或其他错误。
我用 enquo 和 !!如给定 . Tried using !! as_label to combat the error I got from the previous step using . Also tried to use group_by_ instead of group_by as mentioned here。我还尝试了 curly 运算符来解决问题
userMaster <- structure(list(user_id = c(1, 2, 3, 4, 5), city = structure(c(5L,
5L, 8L, 9L, 10L), .Label = c("Austin", "Boise", "Boston", "Chicago",
"Dallas", "Denver", "Detroit", "Kansas City", "Las Vegas", "Los Angeles",
"Manhattan", "Miami", "Minneapolis", "New York City", "Oklahoma City",
"Omaha", "Phoenix", "Saint Louis", "San Francisco", "Washington DC"
), class = "factor"), source = structure(c(2L, 2L, 2L, 2L, 2L
), .Label = c("Adwords", "Organic", "Search Ads"), class = "factor")), row.names = c(NA,
5L), class = "data.frame")
userCount <- function(table, metric){
col_enquo <- enquo(metric)
summary <- table %>% select(!! (col_enquo), source, user_id) %>%
group_by_(!! (col_enquo), source) %>% summarise(users = n_distinct(user_id)) %>%
left_join(table %>% group_by(source) %>%
summarise(total = n_distinct(user_id))) %>% mutate(users/total)
return(summary)
}
genderDemo <- userCount(userMaster, city)
我遇到了各种类型的错误 -
Error: `quos(desire)` must evaluate to column positions or names, not a list
Error in !as_label(col_enquo) : invalid argument type
Error: Quosures can only be unquoted within a quasiquotation context.
# Bad:
list(!!myquosure)
# Good:
dplyr::mutate(data, !!myquosure)
和rlang_0.4.0
,我们可以使用{{...}}
(curly-curly运算符),它可以使评估更简单
library(rlang) #v 0.4.0
library(dplyr) #v 0.8.3
userCount <- function(tbl, metric){
tbl %>%
select({{metric}}, source, user_id) %>%
group_by({{metric}}, source) %>%
summarise(users = n_distinct(user_id)) %>%
left_join(tbl %>%
group_by(source) %>%
summarise(total = n_distinct(user_id))) %>%
mutate(users/total)
}
genderDemo <- userCount(userMaster, desire)
genderDemo
# A tibble: 12 x 5
# Groups: desire [4]
# desire source users total `users/total`
# <fct> <fct> <int> <int> <dbl>
# 1 A a 2 4 0.5
# 2 A b 1 3 0.333
# 3 A c 2 5 0.4
# 4 B a 1 4 0.25
# 5 B b 1 3 0.333
# 6 B c 1 5 0.2
# 7 C a 1 4 0.25
# 8 C b 2 3 0.667
# 9 C c 1 5 0.2
#10 D a 1 4 0.25
#11 D b 1 3 0.333
#12 D c 2 5 0.4
使用 OP 的数据
userCount(userMaster2, city)
#Joining, by = "source"
# A tibble: 4 x 5
# Groups: city [4]
# city source users total `users/total`
# <fct> <fct> <int> <int> <dbl>
#1 Dallas Organic 2 5 0.4
#2 Kansas City Organic 1 5 0.2
#3 Las Vegas Organic 1 5 0.2
#4 Los Angeles Organic 1 5 0.2
注意:-
后缀方法即将弃用。因此,要么在 group_by
中使用 {{..}}
,要么在 group_by(!! enquo(col_enquo))
中使用
数据
set.seed(24)
userMaster <- data.frame(desire = rep(LETTERS[1:4], each = 5),
user_id = sample(1:5, 20, replace = TRUE),
source = sample(letters[1:3], 20, replace = TRUE))
我正在尝试将列名作为参数传递给在其中使用 dplyr 函数的函数。
围绕这个主题已经提出了多个问题,我尝试了所有这些问题,似乎每件事都会抛出一些或其他错误。
我用 enquo 和 !!如给定
userMaster <- structure(list(user_id = c(1, 2, 3, 4, 5), city = structure(c(5L,
5L, 8L, 9L, 10L), .Label = c("Austin", "Boise", "Boston", "Chicago",
"Dallas", "Denver", "Detroit", "Kansas City", "Las Vegas", "Los Angeles",
"Manhattan", "Miami", "Minneapolis", "New York City", "Oklahoma City",
"Omaha", "Phoenix", "Saint Louis", "San Francisco", "Washington DC"
), class = "factor"), source = structure(c(2L, 2L, 2L, 2L, 2L
), .Label = c("Adwords", "Organic", "Search Ads"), class = "factor")), row.names = c(NA,
5L), class = "data.frame")
userCount <- function(table, metric){
col_enquo <- enquo(metric)
summary <- table %>% select(!! (col_enquo), source, user_id) %>%
group_by_(!! (col_enquo), source) %>% summarise(users = n_distinct(user_id)) %>%
left_join(table %>% group_by(source) %>%
summarise(total = n_distinct(user_id))) %>% mutate(users/total)
return(summary)
}
genderDemo <- userCount(userMaster, city)
我遇到了各种类型的错误 -
Error: `quos(desire)` must evaluate to column positions or names, not a list
Error in !as_label(col_enquo) : invalid argument type
Error: Quosures can only be unquoted within a quasiquotation context.
# Bad:
list(!!myquosure)
# Good:
dplyr::mutate(data, !!myquosure)
和rlang_0.4.0
,我们可以使用{{...}}
(curly-curly运算符),它可以使评估更简单
library(rlang) #v 0.4.0
library(dplyr) #v 0.8.3
userCount <- function(tbl, metric){
tbl %>%
select({{metric}}, source, user_id) %>%
group_by({{metric}}, source) %>%
summarise(users = n_distinct(user_id)) %>%
left_join(tbl %>%
group_by(source) %>%
summarise(total = n_distinct(user_id))) %>%
mutate(users/total)
}
genderDemo <- userCount(userMaster, desire)
genderDemo
# A tibble: 12 x 5
# Groups: desire [4]
# desire source users total `users/total`
# <fct> <fct> <int> <int> <dbl>
# 1 A a 2 4 0.5
# 2 A b 1 3 0.333
# 3 A c 2 5 0.4
# 4 B a 1 4 0.25
# 5 B b 1 3 0.333
# 6 B c 1 5 0.2
# 7 C a 1 4 0.25
# 8 C b 2 3 0.667
# 9 C c 1 5 0.2
#10 D a 1 4 0.25
#11 D b 1 3 0.333
#12 D c 2 5 0.4
使用 OP 的数据
userCount(userMaster2, city)
#Joining, by = "source"
# A tibble: 4 x 5
# Groups: city [4]
# city source users total `users/total`
# <fct> <fct> <int> <int> <dbl>
#1 Dallas Organic 2 5 0.4
#2 Kansas City Organic 1 5 0.2
#3 Las Vegas Organic 1 5 0.2
#4 Los Angeles Organic 1 5 0.2
注意:-
后缀方法即将弃用。因此,要么在 group_by
中使用 {{..}}
,要么在 group_by(!! enquo(col_enquo))
数据
set.seed(24)
userMaster <- data.frame(desire = rep(LETTERS[1:4], each = 5),
user_id = sample(1:5, 20, replace = TRUE),
source = sample(letters[1:3], 20, replace = TRUE))