从 ml 模型的结果 table 的预测列中检索类别名称

Retrieve categories name from the predictions column in the result table of the ml model

我开发了一个 ml 模型(逻辑回归模型),使用 spark 2.4.3 和 java,它根据主题(输入)的关键字预测电子邮件的 WorkType(标签)那封电子邮件。我使用训练数据来训练模型,并将其用于测试数据如下:

        LogisticRegressionModel lrModel = lr.fit(training);


        Dataset<Row> result = lrModel.transform(testing);

       result.select("WorkType","Subject","probability","label","prediction")
                .orderBy(org.apache.spark.sql.functions.col("probability").desc())
                .show(100, 30);

我得到的结果如下:

+------------------------+------------------------------+------------------------------+-----+----------+
|                WorkType|                       Subject|                   probability|label|prediction|
+------------------------+------------------------------+------------------------------+-----+----------+
|            Cancellation|Automatic reply: Ticket #72...|[0.8562867173211978,0.02423...|  0.0|       0.0|
|            Cancellation|Ticket #72827 Cancelling Po...|[0.8244896056944511,0.03953...|  0.0|       0.0|
|            Cancellation|Ticket #72827 Cancelling Po...|[0.8127553003889683,0.04411...|  0.0|       0.0|
|            Cancellation|Ticket #72616 Daily Cancell...|[0.8115900852592474,0.03392...|  0.0|       0.0|

为了训练模型,worktype 被转换为标签,现在我们可以转换结果中的预测列,使其将 workType 字符串作为输出吗?请帮我。谢谢!

如果您正在使用 LabelEncoder 转换标签,使用 le.inverse_transform([0.0]) 您会得到字符串