在函数中使用 dplyr,使用函数参数对错误进行分组
Using dplyr within a function, Grouping Error with function arguments
下面是我希望该函数执行的工作示例,然后是该函数的脚本,注意错误发生的位置。
错误信息是:
Error: index out of bounds
我知道这通常意味着 R 找不到正在调用的变量。
有趣的是,在我下面的函数示例中,如果我只按 subgroup_name
分组(传递给函数并成为新创建的数据框中的一列),函数将成功地重新组合该变量,但是我还想按一个名为 variable 的新创建的列(来自 melt)进行分组。
我曾使用 regroup()
使用类似的代码,但已弃用。我正在尝试使用 group_by_()
但无济于事。
我已经阅读了很多其他帖子和答案,今天也尝试了几个小时,但仍然没有成功。
# Initialize example dataset
database <- ggplot2::diamonds
database$diamond <- row.names(diamonds) # needed for melting
subgroup_name <- "cut" # can replace with "color" or "clarity"
subgroup_column <- 2 # can replace with 3 for color, 4 for clarity
# This works, although it would be preferable not to need separate variables for subgroup_name and subgroup_column number
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by(cut, variable) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
# This does not work, I am expecting the same output as above
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, variable) %>% # problem appears to be with finding "variable"
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
来自 NSE vignette:
If you also want to output variables to vary, you need to pass a list
of quoted objects to the .dots argument:
这里应该引用variable
:
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, quote(variable)) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
正如@RichardScriven 所提到的,如果您打算将结果分配给一个新变量,那么您可能希望在最后删除 print
调用并只写 df
,或者不写甚至在函数
中分配 df
否则,即使您执行 x <- subgroup_analysis(...)
也会打印结果
下面是我希望该函数执行的工作示例,然后是该函数的脚本,注意错误发生的位置。
错误信息是:
Error: index out of bounds
我知道这通常意味着 R 找不到正在调用的变量。
有趣的是,在我下面的函数示例中,如果我只按 subgroup_name
分组(传递给函数并成为新创建的数据框中的一列),函数将成功地重新组合该变量,但是我还想按一个名为 variable 的新创建的列(来自 melt)进行分组。
我曾使用 regroup()
使用类似的代码,但已弃用。我正在尝试使用 group_by_()
但无济于事。
我已经阅读了很多其他帖子和答案,今天也尝试了几个小时,但仍然没有成功。
# Initialize example dataset
database <- ggplot2::diamonds
database$diamond <- row.names(diamonds) # needed for melting
subgroup_name <- "cut" # can replace with "color" or "clarity"
subgroup_column <- 2 # can replace with 3 for color, 4 for clarity
# This works, although it would be preferable not to need separate variables for subgroup_name and subgroup_column number
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by(cut, variable) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
# This does not work, I am expecting the same output as above
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, variable) %>% # problem appears to be with finding "variable"
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
来自 NSE vignette:
If you also want to output variables to vary, you need to pass a list of quoted objects to the .dots argument:
这里应该引用variable
:
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, quote(variable)) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
正如@RichardScriven 所提到的,如果您打算将结果分配给一个新变量,那么您可能希望在最后删除 print
调用并只写 df
,或者不写甚至在函数
df
否则,即使您执行 x <- subgroup_analysis(...)