如何以两个列表的形式测试和训练多个数据集?

How can I test and train multiple data sets in the form of two lists?

我想创建一个函数来训练和测试两个列表中的 10 个独立数据集。以下是列表:

blend_30_d<-list(desktop_30_1, desktop_30_2, desktop_30_3, desktop_30_4, desktop_30_5, desktop_30_6, desktop_30_7, desktop_30_8, desktop_30_9, desktop_30_10)

blend_30_td<-list(desktop_30_t1, desktop_30_t2, desktop_30_t3, desktop_30_t4, desktop_30_t5, desktop_30_t6, desktop_30_t7, desktop_30_t8, desktop_30_t9, desktop_30_t10)

每个数据集的名称是:

[1] "date" "Wkday" "Imps" "Clicks" "Total_Cost" "Units"
[7] "January" "February" "March" "April" "May" "June"
[13] "July" "August" "September" "October" "November" "December"
[19] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday"
[25] "Sunday" "Vday" "Tgiving" "Xmas" "XmasE" "NYE"
[31] "NYD" "July4" "Labor" "Memorial" "Mob_App_Launch" "Auto_Approve_Launch"

我构建了以下函数 - 我希望 blend_30_d[1] 针对 blend_30_td[1] 进行测试。

d_cost <- function(train, test){
    ####Run regression on training
    q<-lm(Total_Cost ~ . -date - Wkday - Imps - Clicks + poly(date, 2), data=train)
    ####Predict values into test set
    test_cost_d <- predict.lm(q, x=test)
    ####Calculate R^2 between predicted vs. actual values
    z<-(cor(test_cost_d, test$Total_Cost))^2
}

d_cost(blend_30_d, blend_30_td)

我收到以下错误: terms.formula(formula, data = data) 错误: 使用“.”的数据框中的重复名称 'date'

我不确定这是使用两个列表的正确方法...有什么建议吗?谢谢!

您的 d_cost 函数被构建为获取两个数据帧,一个用于测试,一个用于训练。您试图通过将两个数据框列表传递给它来调用它。您一次为一对数据框构建了函数,因此您需要给它一对,而不是 2 个对列表。尝试这样的事情:

z = rep(NA, length(blend_30_d)
for (i in seq_along(blend_30_d) {
    z[i] = d_cost(blend_30_d[[i]], blend_30_td[[i]])
}

我认为您可能需要添加一个循环:

for(i in 1:10){
    d_cost(train[[i]], test[[i]])
}