将随机森林预测作为列添加到测试文件中

Question

我在 python pandas（在 Jupyter 笔记本中）工作，在那里我为泰坦尼克号数据集创建了一个随机森林模型。 https://www.kaggle.com/c/titanic/data

我读入测试和训练数据，然后清理它并添加新列（两者的列相同）。

在拟合和重新拟合模型并尝试增强等之后；我决定选择一种型号：

 X2 = train_data[['Pclass','Sex','Age','richness']] 
 rfc_model_3 = RandomForestClassifier(n_estimators=200)
 %time cross_val_score(rfc_model_3, X2, Y_target).mean()
 rfc_model_3.fit(X2, Y_target)

然后我预测，有没有人活下来

 X_test = test_data[['Pclass','Sex','Age','richness']]
 predictions = rfc_model_3.predict(X_test)
 preds = pd.DataFrame(predictions, columns=['Survived'])

有没有办法将预测作为 column 添加到测试文件中？

Answer 1

自从

rfc_model_3 = RandomForestClassifier(n_estimators=200)
rfc_model_3.predict(X_test)

returns y : array of shape = [n_samples] (see docs)，你应该能够将模型输出直接添加到 X_test 而无需创建中间层 DataFrame:

X_test['survived'] = rfc_model_3.predict(X_test)

如果您仍然想要中间结果，@EdChum 在评论中的建议就可以了。

将随机森林预测作为列添加到测试文件中

add random forest predictions as column into test file

python

machine-learning

pandas

random-forest