使用 for 循环为数据框动态生成子集列名称
Dynamically generate subset column names for a dataframe using for loop
对于以下数据帧 df
:
df <- structure(list(id = c("M0000607", "M0000609", "M0000612"), `2021-08(actual)` = c(12.6,
19.2, 8.3), `2021-09(actual)` = c(10.3, 17.3, 6.4), `2021-10(actual)` = c(8.9,
15.7, 5.3), `2021-11(actual)` = c(7.3, 14.8, 3.1), `2021-12(actual)` = c(6.1,
14.2, 3.5), `2021-08(pred)` = c(11.65443222, 14.31674997, 7.084180415
), `2021-09(pred)` = c(12.29810914, 17.7143733, 6.057927385),
`2021-10(pred)` = c(9.619846116, 15.54553601, 6.525992602
), `2021-11(pred)` = c(8.352097939, 13.97318204, 3.164682627
), `2021-12(pred)` = c(6.113631596, 14.16243166, 3.288372517
), `2021-08(error)` = c(2.082307066, 1.146759554, 0.687406723
), `2021-09(error)` = c(1.631350383, 2.753457736, 2.952737781
), `2021-10(error)` = c(0.945567783, 4.883250027, 1.215819585
), `2021-11(error)` = c(1.998109138, 0.414373304, 0.342072615
), `2021-12(error)` = c(0.719846116, 0.154463985, 1.225992602
)), class = "data.frame", row.names = c(NA, -3L))
假设我的最终 objective 是将绘图函数重复应用于 df
的子集:
plot_fun <- function(...)
ggplot()
更具体地说,我将循环 year_months <- c('2021-12', '2021-11', '2021-10')
的每个元素并应用 plot_fun
,因此首先,我需要为 year_months
的每个元素对选定列进行子集化。
逻辑是:对于特定的年月,我会选择:id,最近三个月的实际值,最近一个月的误差和预测值。例如,对于 year_month '2021-12', '2021-11', '2021-10'
,子集列为:
2021-12: selected_cols <- c('id', "2021-10(actual)", "2021-11(actual)", "2021-12(actual)",
"2021-12(pred)", "2021-12(error)")
2021-11: selected_cols <- c('id', "2021-09(actual)", "2021-10(actual)", "2021-11(actual)",
"2021-11(pred)", "2021-11(error)")
2021-10: selected_cols <- c('id', "2021-08(actual)", "2021-09(actual)", "2021-10(actual)",
"2021-12(pred)", "2021-12(error)")
最后,我将 运行 代码(伪)如下:
for year_month in year_months{
selected_cols <- ...
plot_fun(selected_cols)
}
现在我的问题是如何编写一个 for 循环函数来为每个选定的 year_month 生成列名向量?
我的试用代码(还不简洁)为2021-12
生成列:
date_pre_n_months <- function(end_date, x) {
floor_date(as.Date(end_date) - months(x), 'month')
}
end_date <- '2021-12-31'
begin_date <- date_pre_n_months(end_date, 3)
year_month <- seq(as.Date("2021-10-31"), as.Date("2021-12-31"), by = "month")
act <- paste0(strftime(year_month, format="%Y-%m"), '(actual)')
pred <- paste0(strftime(end_date, format="%Y-%m"), '(pred)')
error <- paste0(strftime(end_date, format="%Y-%m"), '(error)')
selected_cols <- c('id', act, pred, error)
selected_cols
输出:
"id" "2021-10(actual)" "2021-12(actual)" "2021-12(actual)" "2021-12(pred)" "2021-12(error)"
在 lubridate
包的帮助下,您可以像这样在 base R 中完成它。
year_months <- c('2021-12', '2021-11', '2021-10')
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
df[c(
"id",
format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)]
}, curr, prev, SIMPLIFY = FALSE)
输出
[[1]]
id 2021-10(actual) 2021-11(actual) 2021-12(actual) 2021-12(pred) 2021-12(error)
1 M0000607 8.9 7.3 6.1 6.113632 0.7198461
2 M0000609 15.7 14.8 14.2 14.162432 0.1544640
3 M0000612 5.3 3.1 3.5 3.288373 1.2259926
[[2]]
id 2021-09(actual) 2021-10(actual) 2021-11(actual) 2021-11(pred) 2021-11(error)
1 M0000607 10.3 8.9 7.3 8.352098 1.9981091
2 M0000609 17.3 15.7 14.8 13.973182 0.4143733
3 M0000612 6.4 5.3 3.1 3.164683 0.3420726
[[3]]
id 2021-08(actual) 2021-09(actual) 2021-10(actual) 2021-10(pred) 2021-10(error)
1 M0000607 12.6 10.3 8.9 9.619846 0.9455678
2 M0000609 19.2 17.3 15.7 15.545536 4.8832500
3 M0000612 8.3 6.4 5.3 6.525993 1.2158196
如果要将绘图函数应用于 selected 数据框,则
year_months <- c('2021-12', '2021-11', '2021-10')
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
plots <- mapply(function(x, y) {
plot_fun(df[c(
"id",
format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)])
}, curr, prev, SIMPLIFY = FALSE)
为您提供 (gg) 图列表。
更新(也 select 当前月份的去年)。但是,您需要确保数据框中存在您想要 select 的列;否则,你会得到一个错误。
year_months <- c('2021-12', '2021-11', '2021-10')
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
df[c(
"id",
format(c(x - lubridate::years(1L), seq.Date(y, x, by = "month")), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)]
}, curr, prev, SIMPLIFY = FALSE)
对于以下数据帧 df
:
df <- structure(list(id = c("M0000607", "M0000609", "M0000612"), `2021-08(actual)` = c(12.6,
19.2, 8.3), `2021-09(actual)` = c(10.3, 17.3, 6.4), `2021-10(actual)` = c(8.9,
15.7, 5.3), `2021-11(actual)` = c(7.3, 14.8, 3.1), `2021-12(actual)` = c(6.1,
14.2, 3.5), `2021-08(pred)` = c(11.65443222, 14.31674997, 7.084180415
), `2021-09(pred)` = c(12.29810914, 17.7143733, 6.057927385),
`2021-10(pred)` = c(9.619846116, 15.54553601, 6.525992602
), `2021-11(pred)` = c(8.352097939, 13.97318204, 3.164682627
), `2021-12(pred)` = c(6.113631596, 14.16243166, 3.288372517
), `2021-08(error)` = c(2.082307066, 1.146759554, 0.687406723
), `2021-09(error)` = c(1.631350383, 2.753457736, 2.952737781
), `2021-10(error)` = c(0.945567783, 4.883250027, 1.215819585
), `2021-11(error)` = c(1.998109138, 0.414373304, 0.342072615
), `2021-12(error)` = c(0.719846116, 0.154463985, 1.225992602
)), class = "data.frame", row.names = c(NA, -3L))
假设我的最终 objective 是将绘图函数重复应用于 df
的子集:
plot_fun <- function(...)
ggplot()
更具体地说,我将循环 year_months <- c('2021-12', '2021-11', '2021-10')
的每个元素并应用 plot_fun
,因此首先,我需要为 year_months
的每个元素对选定列进行子集化。
逻辑是:对于特定的年月,我会选择:id,最近三个月的实际值,最近一个月的误差和预测值。例如,对于 year_month '2021-12', '2021-11', '2021-10'
,子集列为:
2021-12: selected_cols <- c('id', "2021-10(actual)", "2021-11(actual)", "2021-12(actual)",
"2021-12(pred)", "2021-12(error)")
2021-11: selected_cols <- c('id', "2021-09(actual)", "2021-10(actual)", "2021-11(actual)",
"2021-11(pred)", "2021-11(error)")
2021-10: selected_cols <- c('id', "2021-08(actual)", "2021-09(actual)", "2021-10(actual)",
"2021-12(pred)", "2021-12(error)")
最后,我将 运行 代码(伪)如下:
for year_month in year_months{
selected_cols <- ...
plot_fun(selected_cols)
}
现在我的问题是如何编写一个 for 循环函数来为每个选定的 year_month 生成列名向量?
我的试用代码(还不简洁)为2021-12
生成列:
date_pre_n_months <- function(end_date, x) {
floor_date(as.Date(end_date) - months(x), 'month')
}
end_date <- '2021-12-31'
begin_date <- date_pre_n_months(end_date, 3)
year_month <- seq(as.Date("2021-10-31"), as.Date("2021-12-31"), by = "month")
act <- paste0(strftime(year_month, format="%Y-%m"), '(actual)')
pred <- paste0(strftime(end_date, format="%Y-%m"), '(pred)')
error <- paste0(strftime(end_date, format="%Y-%m"), '(error)')
selected_cols <- c('id', act, pred, error)
selected_cols
输出:
"id" "2021-10(actual)" "2021-12(actual)" "2021-12(actual)" "2021-12(pred)" "2021-12(error)"
在 lubridate
包的帮助下,您可以像这样在 base R 中完成它。
year_months <- c('2021-12', '2021-11', '2021-10')
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
df[c(
"id",
format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)]
}, curr, prev, SIMPLIFY = FALSE)
输出
[[1]]
id 2021-10(actual) 2021-11(actual) 2021-12(actual) 2021-12(pred) 2021-12(error)
1 M0000607 8.9 7.3 6.1 6.113632 0.7198461
2 M0000609 15.7 14.8 14.2 14.162432 0.1544640
3 M0000612 5.3 3.1 3.5 3.288373 1.2259926
[[2]]
id 2021-09(actual) 2021-10(actual) 2021-11(actual) 2021-11(pred) 2021-11(error)
1 M0000607 10.3 8.9 7.3 8.352098 1.9981091
2 M0000609 17.3 15.7 14.8 13.973182 0.4143733
3 M0000612 6.4 5.3 3.1 3.164683 0.3420726
[[3]]
id 2021-08(actual) 2021-09(actual) 2021-10(actual) 2021-10(pred) 2021-10(error)
1 M0000607 12.6 10.3 8.9 9.619846 0.9455678
2 M0000609 19.2 17.3 15.7 15.545536 4.8832500
3 M0000612 8.3 6.4 5.3 6.525993 1.2158196
如果要将绘图函数应用于 selected 数据框,则
year_months <- c('2021-12', '2021-11', '2021-10')
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
plots <- mapply(function(x, y) {
plot_fun(df[c(
"id",
format(seq.Date(y, x, by = "month"), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)])
}, curr, prev, SIMPLIFY = FALSE)
为您提供 (gg) 图列表。
更新(也 select 当前月份的去年)。但是,您需要确保数据框中存在您想要 select 的列;否则,你会得到一个错误。
year_months <- c('2021-12', '2021-11', '2021-10')
curr <- lubridate::ym(year_months)
prev <- curr - months(2L)
mapply(function(x, y) {
df[c(
"id",
format(c(x - lubridate::years(1L), seq.Date(y, x, by = "month")), "%Y-%m(actual)"),
format(x, "%Y-%m(pred)"),
format(x, "%Y-%m(error)")
)]
}, curr, prev, SIMPLIFY = FALSE)