如何在函数中正确使用 do.call?
How to properly use do.call within a function?
我正在逐步从 SAS 过渡到 R,目前我正在尝试复制我以前使用宏所做的事情。
我有一个包含我所有数据的 table(我们称它为 IDF_pop),然后从这个 table 我创建了另外两个:YVE_pop 和 EPCI_pop,它们是主要 table 的两个子集。我更喜欢创建单独的 table,但我想这可能不是最佳选择。以下是我的处理方式:
## Let's say the main table contains 10 lines.
## codgeo is the city's postal code, epci is the area, and I have three
## variables that describe different parts of the population
codgeo <- c("75014","75020","78300","78520","78650","91200","91600","92500","93100","95230")
epci <- c("001","001","002","002","003","004","004","005","006","007")
pop0_15 <- c(10000*runif(10))
pop15_64 <- c(10000*runif(10))
pop65p <- c(10000*runif(10))
IDF_pop <- data.frame(codgeo,epci,pop0_15,pop15_64,pop65p)
## I'd like my population to be in one single column, for this I'll use melt
IDF_pop_line <- melt(IDF_pop,c("codgeo","epci"))
## Now I want to create separate tables for the Yvelines department (codgeo starts with 78) and for EPCI 002
## I could do it in two lines but I wanted to train using functions so here goes
localisation <- function(code_dep, lib_dep, code_epci, lib_epci){
do.call("<<-",
list(paste0(eval(lib_dep),"_pop_ligne"),
IDF_pop_line %>% filter(stri_sub(codgeo,from=1,length=2)==code_dep)
)
)
do.call("<<-",
list(paste0(eval(lib_epci),"_pop_ligne"),
IDF_pop_line %>% filter(epci==code_epci)
)
)
}
do.call("localisation",list("78","YVE","002","GPSO"))
有了这个,我有了 3 tables(IDF_、YVE_、GPSO_),现在可以解决主要问题了。
我接下来要做的是总结我的table。我正在尝试编写一个适用于所有 3 table 的函数。
我希望它完全依赖于参数,但似乎 do.call 不会在其第二个参数中接受 paste0。
## Aggregating the tables. I'll call the function 3 times, one for each level.
agregation <- function(lib){
# This doesn't :
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
paste0(eval(lib),"_pop_line") %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
do.call("agregation",list("IDF")) # This one doesn't work
agregation2 <- function(lib){
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
IDF_pop_line %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
do.call("agregation2",list("IDF")) # This one does
如您所见,目前我发现的唯一可行方法是写下我用于聚合的 table 的全名。但这违背了拥有可以自由参数化的东西的最初想法。
我如何修改函数的第一个版本,使其适用于所有三个可能的参数?
最后,我知道一个简单的解决方法是保留我的 IDF_pop_line table 并在最后时刻过滤以创建 3 个聚合的 table,但是我更喜欢从一开始就使用单独的 table。
在此先感谢您的帮助!
在您的 agregation
函数字符串中 paste0(eval(lib),"_pop_line")
returns 数据框的名称而不是数据框本身。
尝试 get
agregation <- function(lib){
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
get(paste0(eval(lib),"_pop_line")) %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
这是使用 data.table
的建议。
您可以使用您在进入所有功能之前创建的IDF_pop
。
library(data.table)
#make adata.table out of YVE_pop_ligne
setDT( IDF_pop )
#create groups to summarise by
IDF_pop[ epci == "002", GSPO := TRUE][]
IDF_pop[ grepl("^78", codgeo) , YVE := TRUE][]
#melt and filter only values where a filter is TRUE
dt <- data.table::melt( IDF_pop,
id.vars = c("codgeo", "epci", "pop0_15", "pop15_64", "pop65p"),
measure.vars = c("GSPO", "YVE"))[ value == TRUE,][]
中间结果(dt)
# codgeo epci pop0_15 pop15_64 pop65p variable value
# 1: 78300 002 6692.394 5441.225 4008.875 GSPO TRUE
# 2: 78520 002 2128.604 6808.004 1889.822 GSPO TRUE
# 3: 78300 002 6692.394 5441.225 4008.875 YVE TRUE
# 4: 78520 002 2128.604 6808.004 1889.822 YVE TRUE
# 5: 78650 003 8482.971 6556.482 5098.929 YVE TRUE
代码
#now summarising is easy, sum by varianle-group on all pop-columns
dt[, lapply( .SD, sum), by = variable, .SDcols = names(dt)[grepl("^pop", names(dt) )] ]
最终输出
# variable pop0_15 pop15_64 pop65p
# 1: GSPO 7171.683 5855.894 11866.55
# 2: YVE 12602.153 8028.948 14364.21
我正在逐步从 SAS 过渡到 R,目前我正在尝试复制我以前使用宏所做的事情。
我有一个包含我所有数据的 table(我们称它为 IDF_pop),然后从这个 table 我创建了另外两个:YVE_pop 和 EPCI_pop,它们是主要 table 的两个子集。我更喜欢创建单独的 table,但我想这可能不是最佳选择。以下是我的处理方式:
## Let's say the main table contains 10 lines.
## codgeo is the city's postal code, epci is the area, and I have three
## variables that describe different parts of the population
codgeo <- c("75014","75020","78300","78520","78650","91200","91600","92500","93100","95230")
epci <- c("001","001","002","002","003","004","004","005","006","007")
pop0_15 <- c(10000*runif(10))
pop15_64 <- c(10000*runif(10))
pop65p <- c(10000*runif(10))
IDF_pop <- data.frame(codgeo,epci,pop0_15,pop15_64,pop65p)
## I'd like my population to be in one single column, for this I'll use melt
IDF_pop_line <- melt(IDF_pop,c("codgeo","epci"))
## Now I want to create separate tables for the Yvelines department (codgeo starts with 78) and for EPCI 002
## I could do it in two lines but I wanted to train using functions so here goes
localisation <- function(code_dep, lib_dep, code_epci, lib_epci){
do.call("<<-",
list(paste0(eval(lib_dep),"_pop_ligne"),
IDF_pop_line %>% filter(stri_sub(codgeo,from=1,length=2)==code_dep)
)
)
do.call("<<-",
list(paste0(eval(lib_epci),"_pop_ligne"),
IDF_pop_line %>% filter(epci==code_epci)
)
)
}
do.call("localisation",list("78","YVE","002","GPSO"))
有了这个,我有了 3 tables(IDF_、YVE_、GPSO_),现在可以解决主要问题了。
我接下来要做的是总结我的table。我正在尝试编写一个适用于所有 3 table 的函数。
我希望它完全依赖于参数,但似乎 do.call 不会在其第二个参数中接受 paste0。
## Aggregating the tables. I'll call the function 3 times, one for each level.
agregation <- function(lib){
# This doesn't :
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
paste0(eval(lib),"_pop_line") %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
do.call("agregation",list("IDF")) # This one doesn't work
agregation2 <- function(lib){
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
IDF_pop_line %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
do.call("agregation2",list("IDF")) # This one does
如您所见,目前我发现的唯一可行方法是写下我用于聚合的 table 的全名。但这违背了拥有可以自由参数化的东西的最初想法。 我如何修改函数的第一个版本,使其适用于所有三个可能的参数?
最后,我知道一个简单的解决方法是保留我的 IDF_pop_line table 并在最后时刻过滤以创建 3 个聚合的 table,但是我更喜欢从一开始就使用单独的 table。
在此先感谢您的帮助!
在您的 agregation
函数字符串中 paste0(eval(lib),"_pop_line")
returns 数据框的名称而不是数据框本身。
尝试 get
agregation <- function(lib){
do.call("<<-",
list(paste0(eval(lib),"_pop_agr"),
get(paste0(eval(lib),"_pop_line")) %>%
group_by(variable) %>%
summarise(pop = sum(value))
)
)
}
这是使用 data.table
的建议。
您可以使用您在进入所有功能之前创建的IDF_pop
。
library(data.table)
#make adata.table out of YVE_pop_ligne
setDT( IDF_pop )
#create groups to summarise by
IDF_pop[ epci == "002", GSPO := TRUE][]
IDF_pop[ grepl("^78", codgeo) , YVE := TRUE][]
#melt and filter only values where a filter is TRUE
dt <- data.table::melt( IDF_pop,
id.vars = c("codgeo", "epci", "pop0_15", "pop15_64", "pop65p"),
measure.vars = c("GSPO", "YVE"))[ value == TRUE,][]
中间结果(dt)
# codgeo epci pop0_15 pop15_64 pop65p variable value
# 1: 78300 002 6692.394 5441.225 4008.875 GSPO TRUE
# 2: 78520 002 2128.604 6808.004 1889.822 GSPO TRUE
# 3: 78300 002 6692.394 5441.225 4008.875 YVE TRUE
# 4: 78520 002 2128.604 6808.004 1889.822 YVE TRUE
# 5: 78650 003 8482.971 6556.482 5098.929 YVE TRUE
代码
#now summarising is easy, sum by varianle-group on all pop-columns
dt[, lapply( .SD, sum), by = variable, .SDcols = names(dt)[grepl("^pop", names(dt) )] ]
最终输出
# variable pop0_15 pop15_64 pop65p
# 1: GSPO 7171.683 5855.894 11866.55
# 2: YVE 12602.153 8028.948 14364.21