模型的特征数量必须与输入匹配。型号n_features为7985,输入n_features为1

The number of features of the model must match the input. Model n_features is 7985 and input n_features is 1

我用随机森林构建了一个垃圾邮件分类器,想制作一个单独的函数来将短信分类为垃圾邮件或非垃圾邮件,我尝试过:

def predict_message(pred_text):
    pred_text=[pred_text]
    pred_text2 = tfidf_vect.fit_transform(pred_text)
    pred_features = pd.DataFrame(pred_text2.toarray())
    prediction = rf_model.predict(pred_features)
    return (prediction)

pred_text = "how are you doing today?"

prediction = predict_message(pred_text)
print(prediction)

但它给了我错误:

The number of features of the model must match the input.
Model n_features is 7985 and input n_features is 1 

我看不出问题所在,我该如何解决?

通过调用 tfidf_vect.fit_transform(pred_text),您的向量化器会丢失它从原始训练语料库中获得的所有信息。

你应该打电话给 transform.

以下这些更改应该有所帮助:

def predict_message(pred_text):
    pred_text=[pred_text]
    pred_text2 = tfidf_vect.transform(pred_text)  # Changed
    prediction = rf_model.predict(pred_text2)
    return (prediction)

pred_text = "how are you doing today?"

prediction = predict_message(pred_text)
print(prediction)