Error when using sklearn model loaded by joblib. TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

Question

我使用 sklearn 创建了一个 VotingClassifier() 对象。后来，我使用 joblib 将它保存到 voting_predictor.pkl 文件。当我成功加载它时，当我尝试将某些数据预测为 voting_predictor.predict(X_test) 时，出现以下错误：

TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

我尝试使用 pickle dump/load 对象，但我得到了完全相同的错误。代码如下所示：

eclf1 = VotingClassifier(estimators=estimators, voting='hard')

eclf1 = eclf1.fit(X_train, y_train)
y_pred = eclf1.predict(X_test)

report = classification_report(y_test, y_pred)
poll_accuracy = accuracy_score(y_test, y_pred)

print(report)
print(poll_accuracy)

# successful object dump
filename = 'voting_predictor.pkl'
joblib.dump(eclf1, filename)

#successful object load
voting_predictor = joblib.load(filename)
# this prints the object correctly, showing all its parameters 
print(voting_predictor)

#error shows here
y_pred = voting_predictor.predict(X_test)

report = classification_report(y_test, y_pred)
poll_accuracy = accuracy_score(y_test, y_pred)

print(voting_predictor) 成功打印对象及其所有参数。关于为什么会发生这种情况的任何想法？

Answer 1

我在将 catbooster 与其他预测器集成时遇到了同样的错误。我找到了解决方案，但我正在寻找更优雅的解决方案。

Answer 2

问题是目标列是 classes 的名称，作为字符串。似乎将字符串值保留为某个整数而不对其进行标签编码会导致此错误。但是，在任何其他情况下，sklearn 都正确处理了每个 class 的字符串名称，提供了所有指标，例如 classification_report 和 accuracy_score，没有错误。仅当我从文件加载对象时才发生错误。

Error when using sklearn model loaded by joblib. TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

Error when using sklearn model loaded by joblib. TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

python

scikit-learn

joblib