为什么我不能在 data.table 中使用 `predict`？

Question

我正在尝试在 data.table 中使用 predict.lm，但出现了一个奇怪的错误。第一部分，数据准备，运行完美。

# (1) Load data
library(data.table)
homeprice = fread('https://vincentarelbundock.github.io/Rdatasets/csv/mosaicData/SaratogaHouses.csv')

# (2) Data Prep: Convert character variables into factors.
myvars = c('heating','fuel','sewer','waterfront','newConstruction','centralAir')
for (var in myvars) {
   homeprice[, paste0(var) := as.factor(get(var))]
}

# (3) Split data into training and test sets
install.packages('caTools')
library(caTools)

homeprice[, split := sample.split(V1, SplitRatio = 0.5)]
train = homeprice[split == T,] # Creating training data
test = homeprice[split == F,] # Create test data


# Train OLS model with training data.
reg1 = lm(price ~ . - V1, train)
summary(reg1) # Displays the results from "myfirstreg"

好的，这是给我带来麻烦的部分：

# In sample-prediction: Predict prices for training set
z = predict(reg1, newdata = train)
train[, price_pred := z] # Works perfectly
train[, price_pred := predict(reg1, newdata = train)] # Gives error

请指教

Answer 1

我不知道是什么原因导致错误，但使用 dplyr

train <- train %>% 
  mutate(price_pred = predict(reg1, newdata = train))

似乎给出了与您的示例相同的结果。

Answer 2

似乎用于拆分原始数据集的 "split" 变量的存在带来了问题。从回归中删除它似乎可以解决问题。

reg1 = lm(price ~ . - V1 - split, train)

为什么我不能在 data.table 中使用 `predict`？

Why can't I use `predict` inside a data.table?

r

predict

data.table