我试图在data.table中使用predict.lm,并得到一个奇怪的错误。第一部分,数据准备,完美运行。
# (1) Load data
library(data.table)
homeprice = fread('https://vincentarelbundock.github.io/Rdatasets/csv/mosaicData/SaratogaHouses.csv')
# (2) Data Prep: Convert character variables into factors.
myvars = c('heating','fuel','sewer','waterfront','newConstruction','centralAir')
for (var in myvars) {
homeprice[, paste0(var) := as.factor(get(var))]
}
# (3) Split data into training and test sets
install.packages('caTools')
library(caTools)
homeprice[, split := sample.split(V1, SplitRatio = 0.5)]
train = homeprice[split == T,] # Creating training data
test = homeprice[split == F,] # Create test data
# Train OLS model with training data.
reg1 = lm(price ~ . - V1, train)
summary(reg1) # Displays the results from "myfirstreg"
好的,这是给我带来麻烦的部分:
# In sample-prediction: Predict prices for training set
z = predict(reg1, newdata = train)
train[, price_pred := z] # Works perfectly
train[, price_pred := predict(reg1, newdata = train)] # Gives error
请指教。
看来,用于拆分原始数据集的“ split”变量的存在给我们带来了问题。从回归中删除它似乎可以解决问题。
reg1 = lm(price ~ . - V1 - split, train)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句