使用 pyspark 调整回归树模型的 k 折交叉验证
k-fold cross validation to tune regressive tree model using pyspark
我正在尝试使用 k 折交叉验证来调整在 pyspark 中生成的回归树。但是,据我目前所见,无法将 pyspark 的 CrossValidator 与 pyspark 的 DecisionTree.trainRegressor 结合使用。这是相关代码。
(trainingData, testData) = data.randomSplit([0.7, 0.3])
model = DecisionTree.trainRegressor(trainingData, categoricalFeaturesInfo={}, impurity='variance', maxDepth=5, maxBins=32)
然后如何将 k 折交叉验证应用于回归量?
你可以试试这个:
(trainingData, testData) = data.randomSplit([0.7, 0.3])
model = DecisionTree.trainRegressor(trainingData, categoricalFeaturesInfo={}, impurity='variance', numClasses=2)
paramGrid = ParamGridBuilder() \
.addGrid(model.maxDepth, [4, 5, 6, 7]) \
.addGrid(model.maxBins, [24, 28, 32, 36]) \
.build()
crossval = CrossValidator(estimator=model,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=3)
# Run cross-validation, and choose the best set of parameters.
cvModel = crossval.fit(training)
我正在尝试使用 k 折交叉验证来调整在 pyspark 中生成的回归树。但是,据我目前所见,无法将 pyspark 的 CrossValidator 与 pyspark 的 DecisionTree.trainRegressor 结合使用。这是相关代码。
(trainingData, testData) = data.randomSplit([0.7, 0.3])
model = DecisionTree.trainRegressor(trainingData, categoricalFeaturesInfo={}, impurity='variance', maxDepth=5, maxBins=32)
然后如何将 k 折交叉验证应用于回归量?
你可以试试这个:
(trainingData, testData) = data.randomSplit([0.7, 0.3])
model = DecisionTree.trainRegressor(trainingData, categoricalFeaturesInfo={}, impurity='variance', numClasses=2)
paramGrid = ParamGridBuilder() \
.addGrid(model.maxDepth, [4, 5, 6, 7]) \
.addGrid(model.maxBins, [24, 28, 32, 36]) \
.build()
crossval = CrossValidator(estimator=model,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=3)
# Run cross-validation, and choose the best set of parameters.
cvModel = crossval.fit(training)