MLlib:RFormula.fit() 是如何工作的?
MLlib: How does RFormula.fit() work?
使用 Spark MLlib
创建模型的一种可能性是来自 pyspark.ml.feature
的 RFormula
模块,如 docs. However, I can't find any explanation how fit
is actually implemented in this case. Does it use a squared loss function or something else? Where can I find this kind of information? The source 中所述,这真的很难理解...
正如 Anoop Toffy 在评论中提到的,您可以找到一个不错的小教程 here。引用教程:
The fit()
step determines the mapping of categorical feature values
to vector indices in the output, so that the fitted RFormula can be
used across different datasets.
>>> formula = RFormula(formula="ArrDelay ~ DepDelay + Distance + aircraft_type")
>>> formula.fit(training).transform(training).show()
+--------------+---------+---------+---------+--------------------+------+
| aircraft_type| Distance| DepDelay| ArrDelay| features| label|
+--------------+---------+---------+---------+--------------------+------+
| Balloon| 23| 18| 20| [0.0,0.0,23.0,18.0]| 20.0|
| Multi-Engine| 815| 2| -2| [0.0,1.0,815.0,2.0]| -2.0|
| Single-Engine| 174| 0| 1| [1.0,0.0,174.0,0.0]| 1.0|
+--------------+---------+---------+---------+--------------------+------+
使用 Spark MLlib
创建模型的一种可能性是来自 pyspark.ml.feature
的 RFormula
模块,如 docs. However, I can't find any explanation how fit
is actually implemented in this case. Does it use a squared loss function or something else? Where can I find this kind of information? The source 中所述,这真的很难理解...
正如 Anoop Toffy 在评论中提到的,您可以找到一个不错的小教程 here。引用教程:
The
fit()
step determines the mapping of categorical feature values to vector indices in the output, so that the fitted RFormula can be used across different datasets.>>> formula = RFormula(formula="ArrDelay ~ DepDelay + Distance + aircraft_type") >>> formula.fit(training).transform(training).show() +--------------+---------+---------+---------+--------------------+------+ | aircraft_type| Distance| DepDelay| ArrDelay| features| label| +--------------+---------+---------+---------+--------------------+------+ | Balloon| 23| 18| 20| [0.0,0.0,23.0,18.0]| 20.0| | Multi-Engine| 815| 2| -2| [0.0,1.0,815.0,2.0]| -2.0| | Single-Engine| 174| 0| 1| [1.0,0.0,174.0,0.0]| 1.0| +--------------+---------+---------+---------+--------------------+------+