pyspark.ml.classification.DecisionTreeClassificationModel 中 pyspark.mllib.tree.DecisionTreeModel.toDebugString() 的等价物 - IN PYTHON
Equivalent of pyspark.mllib.tree.DecisionTreeModel.toDebugString() in pyspark.ml.classification.DecisionTreeClassificationModel - IN PYTHON
这基本上是同一个问题:
但是对于 pyspark。
我以前可以做类似的事情:
from pyspark.mllib.tree import DecisionTree
model = DecisionTree.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo=categoricalFeatures, impurity='gini', maxDepth=5, maxBins=16)
print model.toDebugString()
我会得到一个很好的决策树可视化结果:
DecisionTreeModel classifier of depth 5 with 49 nodes
If (feature 1 in {0.0})
If (feature 0 in {0.0})
If (feature 2 <= 52.0)
If (feature 3 <= 26.0)
Predict: 0.0
...
我正在尝试将我的代码移植到 pyspark.ml,但我看不到任何打印结果树的方法
from pyspark.ml.classification import DecisionTreeClassifier
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", maxDepth=5, maxBins=16, impurity='gini')
model = dt.fit(transformedTrainingData)
当我这样做时:
print model
我只得到第一行:
DecisionTreeClassificationModel (uid=DecisionTreeClassifier_4cbda3dcd0bddd9d4a0b) of depth 5 with 43 nodes
关于如何获得漂亮的树输出的想法?
我找到了解决办法。它并不优雅,它违反了封装和你所学到的关于面向对象编程的一切,但它有效:
print model._call_java("toDebugString")
DecisionTreeClassificationModel (uid=DecisionTreeClassifier_4c3bb548827f07c590e6) of depth 5 with 49 nodes
If (feature 1 in {0.0})
If (feature 0 in {1.0,2.0})
If (feature 2 <= 5.0)
If (feature 3 <= 26.0)
Predict: 1.0
Else (feature 3 > 26.0)
If (feature 0 in {2.0})
...
现在(在 Spark 2.2 中)您还可以简单地调用:
print(model.toDebugString)
你会得到类似的东西:
DecisionTreeClassificationModel (uid=DecisionTreeClassifier_48b398caca43f9fd5bc1) of depth 15 with 5237 nodes
If (feature 39 <= 0.09)
If (feature 11 <= 369.79999999999995)
If (feature 33 <= 217.75400000000002)
If (feature 4 <= 3864.0)
If (feature 33 <= -0.01)
If (feature 12 <= 2950.0)
If (feature 33 <= -64.83)
这基本上是同一个问题:
但是对于 pyspark。
我以前可以做类似的事情:
from pyspark.mllib.tree import DecisionTree
model = DecisionTree.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo=categoricalFeatures, impurity='gini', maxDepth=5, maxBins=16)
print model.toDebugString()
我会得到一个很好的决策树可视化结果:
DecisionTreeModel classifier of depth 5 with 49 nodes
If (feature 1 in {0.0})
If (feature 0 in {0.0})
If (feature 2 <= 52.0)
If (feature 3 <= 26.0)
Predict: 0.0
...
我正在尝试将我的代码移植到 pyspark.ml,但我看不到任何打印结果树的方法
from pyspark.ml.classification import DecisionTreeClassifier
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", maxDepth=5, maxBins=16, impurity='gini')
model = dt.fit(transformedTrainingData)
当我这样做时:
print model
我只得到第一行:
DecisionTreeClassificationModel (uid=DecisionTreeClassifier_4cbda3dcd0bddd9d4a0b) of depth 5 with 43 nodes
关于如何获得漂亮的树输出的想法?
我找到了解决办法。它并不优雅,它违反了封装和你所学到的关于面向对象编程的一切,但它有效:
print model._call_java("toDebugString")
DecisionTreeClassificationModel (uid=DecisionTreeClassifier_4c3bb548827f07c590e6) of depth 5 with 49 nodes
If (feature 1 in {0.0})
If (feature 0 in {1.0,2.0})
If (feature 2 <= 5.0)
If (feature 3 <= 26.0)
Predict: 1.0
Else (feature 3 > 26.0)
If (feature 0 in {2.0})
...
现在(在 Spark 2.2 中)您还可以简单地调用:
print(model.toDebugString)
你会得到类似的东西:
DecisionTreeClassificationModel (uid=DecisionTreeClassifier_48b398caca43f9fd5bc1) of depth 15 with 5237 nodes
If (feature 39 <= 0.09)
If (feature 11 <= 369.79999999999995)
If (feature 33 <= 217.75400000000002)
If (feature 4 <= 3864.0)
If (feature 33 <= -0.01)
If (feature 12 <= 2950.0)
If (feature 33 <= -64.83)