ValueError: Mix of label input types (string and number)

Question

我正在尝试使用 XGBoost 算法。我有一个具有 4 个属性（quat_1、quat_2、quat_3、quat_4）的数据集和一个可以具有九个不同值（0、1、2、3、 4、5、6、7、8）我正在尝试使用以下代码实现 XGBoost 算法：

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state = 0, stratify = y)
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
pca = PCA(n_components=2, svd_solver='auto', whiten=True)
fit = pca.fit(x_test)
pca.fit(x_train)
print("Explained Variance: %s" % fit.explained_variance_ratio_)
print(fit.components_)
D_train = xgb.DMatrix(x_train, label=y_train)
D_test = xgb.DMatrix(x_test, label=y_test)
parameters = {'eta': 0.3, 'max_depth': 9, 'objective': 'multi:softprob', 'num_class': 9}
steps=20
classifier = xgb.train(parameters, D_train, steps)
preds = classifier.predict(D_test)
best_preds= np.asarray([np.argmax(line) for line in preds])
print("Precision = {}".format(precision_score(y_test, best_preds, average='macro')))
print("Recall = {}".format(recall_score(y_test, best_preds, average='macro')))
print("Accuracy = {}".format(accuracy_score(y_test, best_preds)))

但它导致了这个错误： ValueError：标签输入类型混合（字符串和数字） 有人可以帮助我吗？

Answer 1

这意味着在你的标签中有混合类型，理想情况下标签应该是数字。

检查数据的任何列中是否有更多字符串，如果有字符串，则将其设为数字。假设你有 3 个组 a、b、c，那么你可以给它们分类 1、2、3。该模型适用于数字数据，无需提供字符串。

Answer 2

这似乎表明标签具有混合数据类型。训练样本和目标都应该是数字。如果数据被解析或读取不正确，请使用 pd.to_numeric 之类的东西强制转换为数字数据类型。

还有一些好像没有意义的东西：

 D_test = xgb.DMatrix(x_test, label=y_test)

为什么要在预测 DMatrix 中包含 y_test？ xgb.predict 只会 return 预测。您需要 y_test 来检查指标，但它不能作为模型的输入。

您似乎也没有对拟合的 PCA 模型做任何事情，不确定那是否是故意的。

ValueError: Mix of label input types (string and number)

ValueError: Mix of label input types (string and number)

python

xgboost