为什么我不能在 data.table 中使用 `predict`?
Why can't I use `predict` inside a data.table?
我正在尝试在 data.table 中使用 predict.lm,但出现了一个奇怪的错误。第一部分,数据准备,运行完美。
# (1) Load data
library(data.table)
homeprice = fread('https://vincentarelbundock.github.io/Rdatasets/csv/mosaicData/SaratogaHouses.csv')
# (2) Data Prep: Convert character variables into factors.
myvars = c('heating','fuel','sewer','waterfront','newConstruction','centralAir')
for (var in myvars) {
homeprice[, paste0(var) := as.factor(get(var))]
}
# (3) Split data into training and test sets
install.packages('caTools')
library(caTools)
homeprice[, split := sample.split(V1, SplitRatio = 0.5)]
train = homeprice[split == T,] # Creating training data
test = homeprice[split == F,] # Create test data
# Train OLS model with training data.
reg1 = lm(price ~ . - V1, train)
summary(reg1) # Displays the results from "myfirstreg"
好的,这是给我带来麻烦的部分:
# In sample-prediction: Predict prices for training set
z = predict(reg1, newdata = train)
train[, price_pred := z] # Works perfectly
train[, price_pred := predict(reg1, newdata = train)] # Gives error
请指教
我不知道是什么原因导致错误,但使用 dplyr
train <- train %>%
mutate(price_pred = predict(reg1, newdata = train))
似乎给出了与您的示例相同的结果。
似乎用于拆分原始数据集的 "split" 变量的存在带来了问题。从回归中删除它似乎可以解决问题。
reg1 = lm(price ~ . - V1 - split, train)
我正在尝试在 data.table 中使用 predict.lm,但出现了一个奇怪的错误。第一部分,数据准备,运行完美。
# (1) Load data
library(data.table)
homeprice = fread('https://vincentarelbundock.github.io/Rdatasets/csv/mosaicData/SaratogaHouses.csv')
# (2) Data Prep: Convert character variables into factors.
myvars = c('heating','fuel','sewer','waterfront','newConstruction','centralAir')
for (var in myvars) {
homeprice[, paste0(var) := as.factor(get(var))]
}
# (3) Split data into training and test sets
install.packages('caTools')
library(caTools)
homeprice[, split := sample.split(V1, SplitRatio = 0.5)]
train = homeprice[split == T,] # Creating training data
test = homeprice[split == F,] # Create test data
# Train OLS model with training data.
reg1 = lm(price ~ . - V1, train)
summary(reg1) # Displays the results from "myfirstreg"
好的,这是给我带来麻烦的部分:
# In sample-prediction: Predict prices for training set
z = predict(reg1, newdata = train)
train[, price_pred := z] # Works perfectly
train[, price_pred := predict(reg1, newdata = train)] # Gives error
请指教
我不知道是什么原因导致错误,但使用 dplyr
train <- train %>%
mutate(price_pred = predict(reg1, newdata = train))
似乎给出了与您的示例相同的结果。
似乎用于拆分原始数据集的 "split" 变量的存在带来了问题。从回归中删除它似乎可以解决问题。
reg1 = lm(price ~ . - V1 - split, train)