dplyr 字符串作为列参考
dplyr string as column reference
是否可以将字符串作为列引用传递给 dplyr 过程?
这是一个示例 - 使用分组数据集和一个简单函数,我尝试将字符串作为对列的引用进行传递。谢谢!
machines <- data.frame(Date=c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"),
Model.Num=c("123", "456", "123", "456", "123", "456"),
Cost=c(200, 300, 250, 350, 300, 400))
my.fun <- function(data, colname){
mutate(data, position=cumsum(as.name(colname)))
}
machines <- machines %>% group_by(Date, Model.Num)
machines <- my.fun(machines, "Cost")
这是一个使用 lazyeval 包中的 interp()
的选项,它随 dplyr 安装一起提供。在您的函数中,您需要使用 dplyr 函数的标准评估版本。在这种情况下,它将是 mutate_()
.
请注意,由于您在 machines
中设置分组的方式,新列 position
将与此处的 Cost
列相同。对 my_fun()
的第二次调用表明它正在处理一组不同的分组变量。
library(dplyr)
library(lazyeval)
my_fun <- function(data, col) {
mutate_(data, position = interp(~ cumsum(x), x = as.name(col)))
}
my_fun(machines, "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400
## second example - different grouping
my_fun(group_by(machines, Model.Num), "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050
我们可以在不使用 lazyeval
包的情况下在标准评估中进行评估。我们可以使用 setNames
.
将一些字符串设置为变量名
library(tidyverse)
machines <- data.frame(
Date = c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"),
Model.Num = c("123", "456", "123", "456", "123", "456"),
Cost = c(200, 300, 250, 350, 300, 400)
)
my_fun <- function(data, col) {
mutate_(data, .dots = setNames(paste0("cumsum(", col, ")"), "position"))
}
my_fun(machines %>% group_by(Date, Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Date, Model.Num [6]
#
# Date Model.Num Cost position
# <fctr> <fctr> <dbl> <dbl>
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400
my_fun(machines %>% group_by(Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Model.Num [2]
#
# Date Model.Num Cost position
# <fctr> <fctr> <dbl> <dbl>
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050
是否可以将字符串作为列引用传递给 dplyr 过程?
这是一个示例 - 使用分组数据集和一个简单函数,我尝试将字符串作为对列的引用进行传递。谢谢!
machines <- data.frame(Date=c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"),
Model.Num=c("123", "456", "123", "456", "123", "456"),
Cost=c(200, 300, 250, 350, 300, 400))
my.fun <- function(data, colname){
mutate(data, position=cumsum(as.name(colname)))
}
machines <- machines %>% group_by(Date, Model.Num)
machines <- my.fun(machines, "Cost")
这是一个使用 lazyeval 包中的 interp()
的选项,它随 dplyr 安装一起提供。在您的函数中,您需要使用 dplyr 函数的标准评估版本。在这种情况下,它将是 mutate_()
.
请注意,由于您在 machines
中设置分组的方式,新列 position
将与此处的 Cost
列相同。对 my_fun()
的第二次调用表明它正在处理一组不同的分组变量。
library(dplyr)
library(lazyeval)
my_fun <- function(data, col) {
mutate_(data, position = interp(~ cumsum(x), x = as.name(col)))
}
my_fun(machines, "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400
## second example - different grouping
my_fun(group_by(machines, Model.Num), "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050
我们可以在不使用 lazyeval
包的情况下在标准评估中进行评估。我们可以使用 setNames
.
library(tidyverse)
machines <- data.frame(
Date = c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"),
Model.Num = c("123", "456", "123", "456", "123", "456"),
Cost = c(200, 300, 250, 350, 300, 400)
)
my_fun <- function(data, col) {
mutate_(data, .dots = setNames(paste0("cumsum(", col, ")"), "position"))
}
my_fun(machines %>% group_by(Date, Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Date, Model.Num [6]
#
# Date Model.Num Cost position
# <fctr> <fctr> <dbl> <dbl>
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400
my_fun(machines %>% group_by(Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Model.Num [2]
#
# Date Model.Num Cost position
# <fctr> <fctr> <dbl> <dbl>
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050