RandomForestClassifier 实例尚未安装。在使用此方法之前使用适当的参数调用 'fit'
RandomForestClassifier instance not fitted yet. Call 'fit' with appropriate arguments before using this method
我正在尝试训练决策树模型,保存它,然后在我需要时重新加载它。但是,我不断收到以下错误:
This DecisionTreeClassifier instance is not fitted yet. Call 'fit'
with appropriate arguments before using this method.
这是我的代码:
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.20, random_state=4)
names = ["Decision Tree", "Random Forest", "Neural Net"]
classifiers = [
DecisionTreeClassifier(),
RandomForestClassifier(),
MLPClassifier()
]
score = 0
for name, clf in zip(names, classifiers):
if name == "Decision Tree":
clf = DecisionTreeClassifier(random_state=0)
grid_search = GridSearchCV(clf, param_grid=param_grid_DT)
grid_search.fit(X_train, y_train_TF)
if grid_search.best_score_ > score:
score = grid_search.best_score_
best_clf = clf
elif name == "Random Forest":
clf = RandomForestClassifier(random_state=0)
grid_search = GridSearchCV(clf, param_grid_RF)
grid_search.fit(X_train, y_train_TF)
if grid_search.best_score_ > score:
score = grid_search.best_score_
best_clf = clf
elif name == "Neural Net":
clf = MLPClassifier()
clf.fit(X_train, y_train_TF)
y_pred = clf.predict(X_test)
current_score = accuracy_score(y_test_TF, y_pred)
if current_score > score:
score = current_score
best_clf = clf
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'wb') as file:
pickle.dump(best_clf, file)
from sklearn.externals import joblib
# Save to file in the current working directory
joblib_file = "joblib_model.pkl"
joblib.dump(best_clf, joblib_file)
print("best classifier: ", best_clf, " Accuracy= ", score)
以下是我如何加载模型并对其进行测试:
#First method
with open(pkl_filename, 'rb') as h:
loaded_model = pickle.load(h)
#Second method
joblib_model = joblib.load(joblib_file)
如您所见,我尝试了两种保存方法,但 none 有效。
以下是我的测试方式:
print(loaded_model.predict(test))
print(joblib_model.predict(test))
您可以清楚地看到这些模型实际上是 拟合的 ,如果我尝试使用任何其他模型,例如 SVM 或 Logistic 回归,该方法就可以正常工作。
问题出在这一行:
best_clf = clf
您已将 clf
传递给 grid_search
,它会克隆估算器并在这些克隆模型上拟合数据。因此,您的实际 clf
保持原样且未安装。
你需要的是
best_clf = grid_search
保存拟合的 grid_search
模型。
如果您不想保存grid_search的全部内容,您可以使用grid_search
的best_estimator_
属性来获取实际克隆的拟合模型。
best_clf = grid_search.best_estimator_
只是想对上面的回答补充一点。即使您手动将 pickle 文件复制粘贴到要加载模型的不同目录,我们最终也会遇到该错误。如果您想移动该文件,请使用剪切粘贴。
我正在尝试训练决策树模型,保存它,然后在我需要时重新加载它。但是,我不断收到以下错误:
This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
这是我的代码:
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.20, random_state=4)
names = ["Decision Tree", "Random Forest", "Neural Net"]
classifiers = [
DecisionTreeClassifier(),
RandomForestClassifier(),
MLPClassifier()
]
score = 0
for name, clf in zip(names, classifiers):
if name == "Decision Tree":
clf = DecisionTreeClassifier(random_state=0)
grid_search = GridSearchCV(clf, param_grid=param_grid_DT)
grid_search.fit(X_train, y_train_TF)
if grid_search.best_score_ > score:
score = grid_search.best_score_
best_clf = clf
elif name == "Random Forest":
clf = RandomForestClassifier(random_state=0)
grid_search = GridSearchCV(clf, param_grid_RF)
grid_search.fit(X_train, y_train_TF)
if grid_search.best_score_ > score:
score = grid_search.best_score_
best_clf = clf
elif name == "Neural Net":
clf = MLPClassifier()
clf.fit(X_train, y_train_TF)
y_pred = clf.predict(X_test)
current_score = accuracy_score(y_test_TF, y_pred)
if current_score > score:
score = current_score
best_clf = clf
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'wb') as file:
pickle.dump(best_clf, file)
from sklearn.externals import joblib
# Save to file in the current working directory
joblib_file = "joblib_model.pkl"
joblib.dump(best_clf, joblib_file)
print("best classifier: ", best_clf, " Accuracy= ", score)
以下是我如何加载模型并对其进行测试:
#First method
with open(pkl_filename, 'rb') as h:
loaded_model = pickle.load(h)
#Second method
joblib_model = joblib.load(joblib_file)
如您所见,我尝试了两种保存方法,但 none 有效。
以下是我的测试方式:
print(loaded_model.predict(test))
print(joblib_model.predict(test))
您可以清楚地看到这些模型实际上是 拟合的 ,如果我尝试使用任何其他模型,例如 SVM 或 Logistic 回归,该方法就可以正常工作。
问题出在这一行:
best_clf = clf
您已将 clf
传递给 grid_search
,它会克隆估算器并在这些克隆模型上拟合数据。因此,您的实际 clf
保持原样且未安装。
你需要的是
best_clf = grid_search
保存拟合的 grid_search
模型。
如果您不想保存grid_search的全部内容,您可以使用grid_search
的best_estimator_
属性来获取实际克隆的拟合模型。
best_clf = grid_search.best_estimator_
只是想对上面的回答补充一点。即使您手动将 pickle 文件复制粘贴到要加载模型的不同目录,我们最终也会遇到该错误。如果您想移动该文件,请使用剪切粘贴。