如何在机器学习中用原始值检查模型预测

Question

我已经使用机器学习训练了我的模型，想用原始值进行检查。我做的对吗？每当我更改 'value' 中的数字时都会得到相同的结果。

X_train, X_test, y_train, y_test = train_test_split(normalize(df4), y, test_size=0.2, random_state=0)

rfc1=RandomForestClassifier( random_state=0, max_features='auto', n_estimators= 90, max_depth=8, criterion='gini' )
rfc1.fit(X_train, y_train)

value=[2.60,1.0,3.0,19.0,1.0,1.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0]
print(rfc1.predict([value]))

y_pred=rfc1.predict(X_test)

print(metrics.classification_report(y_test,y_pred))
df_cm = pd.DataFrame(confusion_matrix(y_test,y_pred), index = [i for i in ['Avg Marks','Good Marks','Bad Marks']],
                  columns = [i for i in ['Avg Marks','Good Marks','Bad Marks']])
plt.figure(figsize = (4,2))
sn.heatmap(df_cm, annot=True, fmt='g')

模型准确性很好，但仍然总是“高分的可能性很高”

['High Chances of Good Marks']
                            precision    recall  f1-score   support

             Average Marks       0.80      0.83      0.82        59
 High Chances of Bad Marks       0.81      0.72      0.76        18
High Chances of Good Marks       0.86      0.86      0.86        50

                  accuracy                           0.83       127
                 macro avg       0.83      0.80      0.81       127
              weighted avg       0.83      0.83      0.83       127

原始数据是这样的

Answer 1

解决了我的问题。我在拟合之前对模型进行了归一化。

但原始数据（值）未归一化。

下面的代码有效，但现在我用于拟合的数据未正常化。有什么解决办法吗？

X_train, X_test, y_train, y_test = train_test_split(df4, y, test_size=0.2, random_state=0)

rfc1=RandomForestClassifier( random_state=0, max_features='auto', n_estimators= 90, max_depth=8, criterion='gini' )
rfc1.fit(X_train, y_train)

value=[2.60,1.0,3.0,19.0,1.0,1.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,1.0]
print(rfc1.predict([value]))


y_pred=rfc1.predict(X_test)

print(metrics.classification_report(y_test,y_pred))
df_cm = pd.DataFrame(confusion_matrix(y_test,y_pred), index = [i for i in ['Avg Marks','Good Marks','Bad Marks']],
                  columns = [i for i in ['Avg Marks','Good Marks','Bad Marks']])
plt.figure(figsize = (4,2))
sn.heatmap(df_cm, annot=True, fmt='g')

Answer 2

首先，您需要定义自己想要什么。是否规范化数据。

我建议您对其进行标准化。

如果对输入数据进行归一化，则需要使用归一化值训练数据；并且，您还需要使用归一化数据评估预测。

您始终必须在预测中使用与您在训练中使用的相同结构的输入数据进行重现

看看这个 link enter link description here 参考规范化。

如何在机器学习中用原始值检查模型预测

How to check the model prediction with original values in machine learning

python

machine-learning

pandas

random-forest

data-science