如何使用编码特征预测 sklearn 中的值？

Question

我当前的数据框如下所示：

 salary   job title    Raiting   Company_Name  Location    Seniority   Excel_needed
0  100         SE         5          apple        sf          vp             0
1  120         DS         4         Samsung       la          Jr             1
2  230         QA         5         google        sd          Sr             1

现在，在对多个类别应用 sklearn 的 Onehotencoding 后，我得到了令人满意的模型分数，并希望根据它们的字符串值预测结果，例如：model.predict('SE','5','apple','ca','vp','1') 而不是尝试输入 1000 的基于单热编码数据帧的 0 和 1。我将如何处理这件事？

Answer 1

你需要把所有的处理都保存下来，写一个函数来使用。

这是一个基本示例：

title_encoder = LabelEncoder()
title_encoder.fit(train['job title'])


def predict(model, data, job_title_column, encoder):
    data[job_title_column] = encoder.transform(data[job_title_column])
    prediction = model.predict(data)
    return prediction

predictions = predict(model, data, 'job title', title_encoder)

您也可以尝试使用管道：https://scikit-learn.org/stable/modules/compose.html

如何使用编码特征预测 sklearn 中的值？

how to predict values in sklearn with encoded features?

encoding

machine-learning

pandas

scikit-learn

sklearn-pandas