Error : 'data' must be a data.frame, environment, or list

Error : 'data' must be a data.frame, environment, or list

#define training and testing sets
set.seed(555)
train <- df2[1:800, c("charges")]
y_test <- df2[801:nrow(df2), c("charges")]
test <- df2[801:nrow(df2), c("age","bmi","children","smoker")]
   
#use model to make predictions on a test set
model <- pcr(charges~age+bmi+children+smoker, data = train, scale=TRUE, validation="CV")
pcr_pred <- predict(model, test, ncomp = 4)

#calculate RMSE
sqrt(mean((pcr_pred - y_test)^2))

我不知道为什么会出现此错误...已经尝试了很多方法但仍然卡在这里

当你执行:

train <- df2[1:800, c("charges")]

您创建了一个 R 原子特征向量。除非您还添加了 drop=FALSE 参数,否则结果的 class 不会是列表:

train <- df2[1:800, c("charges"), drop=FALSE]

尽管缺少任何数据使我们无法确定是否会出现进一步的错误,但这应该可以修复该错误。实际上,我很确定您不希望该火车对象只是一个列,因为您的模型显然需要其他列。试试这个:

set.seed(555)
train <- df2[1:800, ]
test <- df2[801:nrow(df2), ]
   
#use model to make predictions on a test set
model <- pcr(charges~age+bmi+children+smoker, data = train, scale=TRUE, validation="CV")
pcr_pred <- predict(model, test, ncomp = 4)

#calculate RMSE
sqrt(mean((pcr_pred - y_test)^2))