遍历函数参数的最佳方法是什么?
What is the best way to loop through function arguments?
这感觉就像一个非常简单的操作 - 从一个数据帧中按组计算平均值并将其合并到另一个预先格式化的数据帧中 - 我的 UDF 就是这样做的,而这并不是我正在努力的部分。
我想要的是让我的函数遍历一系列参数(列表、向量等)。
我希望能够快速构建一个变量列表(或矢量 - 我不打算使用列表)并将其作为参数传递给函数,以便它构建一个包含我提供的所有变量的数据框它在那个列表中。我的真实数据库有 50 多个变量,我想用不同的变量组合制作各种不同类型的新数据框。一个列表可能有 5 个变量,另一个可能有 25 个。但我对我在概念上有错误的想法持开放态度,我应该使用循环、purrr、映射、应用、其他一些包等,或者改变我的函数是这样写的?我错过了什么?
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
#Here I set up the dataframe which the function will bind to
data_sample_averages <- data_sample %>%
group_by(Name) %>%
dplyr::summarise(Defense_Grade_Average = mean(Defense_Grade))
#> `summarise()` ungrouping output (override with `.groups` argument)
#Function which computes average of variable (the only argument) and merges it back to data_sample_averages
get_avg2 <- function(v_name) {
avg <- "_Average"
data_1 <- data_sample %>%
dplyr::group_by(Name) %>%
dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE))
data_sample_averages <- merge(data_sample_averages, data_1, by = "Name")
return(data_sample_averages)
}
#This works - it computers the average of Tackle_Grade and binds it to data_sample_averages
#However my real dataframe has 50+ columns and I don't want to copy and paste this line 50 times, changing the argument every time.
data_sample_averages <- get_avg2(Tackle_Grade)
#> `summarise()` ungrouping output (override with `.groups` argument)
#shows you the averages
print(data_sample_averages)
#> Name Defense_Grade_Average Tackle_Grade__Average
#> 1 Andre Walker 95.33333 76
#> 2 Dalton Campbell 88.66667 69
#Neither of these work - this is where I'm stuck
#I want my function to iterate through a list of arguments which are essentially just character #strings in order for the UDF to work
variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade")
data_sample_averages <- lapply(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
data_sample_averages <- purrr::map(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
你说你不习惯使用列表,所以我使用向量。
我的解决方案依赖于最新版本 dplyr
中的一个函数:across()
函数。
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
# The function
compute_avg <- function(.data, names){
names_quo <- enquos(names)
.data %>%
group_by(Name) %>%
summarise(
across(
.cols = !!!names_quo,
.fns = ~ mean(.x, na.rm = TRUE),
.names = "{.col}_Average"
)
)
}
compute_avg(.data = data_sample, names = c(Defense_Grade, Tackle_Grade))
# A tibble: 2 x 3
Name Defense_Grade_Average Tackle_Grade_Average
<chr> <dbl> <dbl>
1 Andre Walker 95.3 76
2 Dalton Campbell 88.7 69
这感觉就像一个非常简单的操作 - 从一个数据帧中按组计算平均值并将其合并到另一个预先格式化的数据帧中 - 我的 UDF 就是这样做的,而这并不是我正在努力的部分。
我想要的是让我的函数遍历一系列参数(列表、向量等)。
我希望能够快速构建一个变量列表(或矢量 - 我不打算使用列表)并将其作为参数传递给函数,以便它构建一个包含我提供的所有变量的数据框它在那个列表中。我的真实数据库有 50 多个变量,我想用不同的变量组合制作各种不同类型的新数据框。一个列表可能有 5 个变量,另一个可能有 25 个。但我对我在概念上有错误的想法持开放态度,我应该使用循环、purrr、映射、应用、其他一些包等,或者改变我的函数是这样写的?我错过了什么?
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
#Here I set up the dataframe which the function will bind to
data_sample_averages <- data_sample %>%
group_by(Name) %>%
dplyr::summarise(Defense_Grade_Average = mean(Defense_Grade))
#> `summarise()` ungrouping output (override with `.groups` argument)
#Function which computes average of variable (the only argument) and merges it back to data_sample_averages
get_avg2 <- function(v_name) {
avg <- "_Average"
data_1 <- data_sample %>%
dplyr::group_by(Name) %>%
dplyr::summarise("{{ v_name }}_{avg}" := mean({{ v_name }}, na.rm = TRUE))
data_sample_averages <- merge(data_sample_averages, data_1, by = "Name")
return(data_sample_averages)
}
#This works - it computers the average of Tackle_Grade and binds it to data_sample_averages
#However my real dataframe has 50+ columns and I don't want to copy and paste this line 50 times, changing the argument every time.
data_sample_averages <- get_avg2(Tackle_Grade)
#> `summarise()` ungrouping output (override with `.groups` argument)
#shows you the averages
print(data_sample_averages)
#> Name Defense_Grade_Average Tackle_Grade__Average
#> 1 Andre Walker 95.33333 76
#> 2 Dalton Campbell 88.66667 69
#Neither of these work - this is where I'm stuck
#I want my function to iterate through a list of arguments which are essentially just character #strings in order for the UDF to work
variable_list <- list("Defense_Grade", "Tackle_Grade", "Coverage Grade")
data_sample_averages <- lapply(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
data_sample_averages <- purrr::map(variable_list, get_avg2)
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Defense_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> Warning in mean.default(~"Tackle_Grade", na.rm = TRUE): argument is not numeric
#> or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> Warning in mean.default(~"Coverage Grade", na.rm = TRUE): argument is not
#> numeric or logical: returning NA
#> `summarise()` ungrouping output (override with `.groups` argument)
你说你不习惯使用列表,所以我使用向量。
我的解决方案依赖于最新版本 dplyr
中的一个函数:across()
函数。
library(tidyverse)
data_sample <- data.frame(
Name = c("Dalton Campbell", "Dalton Campbell", "Dalton Campbell", "Andre Walker", "Andre Walker", "Andre Walker"),
Defense_Grade = c(88, 86, 92, 94, 97, 95),
Tackle_Grade = c(66, 69, 72, 74, 76, 78),
Coverage_Grade = c(44, 43, 44, 76, 73, 78)
)
# The function
compute_avg <- function(.data, names){
names_quo <- enquos(names)
.data %>%
group_by(Name) %>%
summarise(
across(
.cols = !!!names_quo,
.fns = ~ mean(.x, na.rm = TRUE),
.names = "{.col}_Average"
)
)
}
compute_avg(.data = data_sample, names = c(Defense_Grade, Tackle_Grade))
# A tibble: 2 x 3
Name Defense_Grade_Average Tackle_Grade_Average
<chr> <dbl> <dbl>
1 Andre Walker 95.3 76
2 Dalton Campbell 88.7 69