使用 xgboost 将 id/index 与预测匹配/预测单个数据点

Question

我一直在尝试构建一个数据框，其中有一列包含模型的预测值，但没有成功。

为了一个简单的例子，我将使用 iris 数据集：

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(np.concatenate((iris.data, np.array([iris.target]).T), axis=1), columns=iris.feature_names + ['target'])
df.head()

这将输出：

对于构建模型的后续步骤，我将

# Get the x and y for the experiment
X = df.drop('target', 1).values
y = df["target"].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

#Create an XGB classifier and instance of the same
from xgboost import XGBClassifier
clf = XGBClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

此时我被屏蔽了。我查看了一些关于如何检索单个数据点的 index/id 的帖子（每一行都是一个数据点），但没有成功。

我是否可以将预测与每一行相匹配？或者作为替代方案，测试单个行以便我可以知道它们的预测结果？

Answer 1

一个简单的方法是将 X 和 y 保留为数据帧（即删除 .values）：

X = df.drop('target', 1)
y = df["target"]
# rest of your code as is

因此，在运行其余代码（即拟合模型和获得预测 y_pred 之后，您可以添加回 X_test（现在是数据框) target 和 prediction 列：

X_test = X_test.assign(target = y_test.values)
X_test = X_test.assign(prediction = y_pred)

print(X_test.head())
# result:
     sepal length (cm)  sepal width (cm)  ...  target  prediction
14                 5.8               4.0  ...     0.0         0.0
98                 5.1               2.5  ...     1.0         1.0
75                 6.6               3.0  ...     1.0         1.0
16                 5.4               3.9  ...     0.0         0.0
131                7.9               3.8  ...     2.0         2.0

[5 rows x 6 columns]

使用 xgboost 将 id/index 与预测匹配/预测单个数据点

Match id/index to prediction with xgboost/ predict individual datatpoints

dataframe

python-3.x

pandas

scikit-learn

xgboost