什么是特征以及如何在 RFormula 中解释

Question

我想了解 MLflow 或 spark 中的 RFormula 是什么。

我找到了这些：

但仍然无法理解如何完整解释 RFormula。我不确定如何解释以下 table

根据公式“y ~ x+ s”，y与x和s相关，但在table中当y=0且x=0且s=a时（即第三行），那么features是[0,1]，label是0，那我该怎么解释呢。

我找到了但仍然无法理解我解决这个问题的方法。

Answer 1

所以你的标签是y。您在 rformula.

中解析 x 和 s

x 保持不变：

+-----------+---+
|      x    | x |
+-----------+---+
|     1.0   |1.0|
|     2.0   |2.0|
|     0.0   |0.0|
+-----------+---+

s:

+-----------+---+
|       s   | s |
+-----------+---+
|       a   |1.0|
|       b   |0.0|
|       a   |1.0|
+-----------+---+

希望我能回答你的问题。 Rformula 只是转换字符串，将它们标准化并将它们解析为向量。

what is features and how to interpret in RFormula