如何 plot_tree 用于流水线多输出分类器？

Question

我想解释我的模型，了解为什么这个模型给我的标签是 1 或 0。，所以我想使用 xgboost 中的 plot_tree 函数。我的问题是多标签分类问题；我写了以下代码；

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, shuffle=True, random_state=42)

model = MultiOutputClassifier(
        xgb.XGBClassifier(objective="binary:logistic",
                         colsample_bytree = 0.5,
                          gamma = 0.1
                         ))

#Define a pipeline
pipeline = Pipeline([("preprocessing", col_transformers), ("XGB", model)])

pipeline.fit(X_train, y_train)

predicted = pipeline.predict(X_test)

xgb.plot_tree(pipeline, num_trees=4)

这段代码给我错误；

'Pipeline' object has no attribute 'get_dump'

如果我更改代码；

xgb.plot_tree(pipeline.named_steps["XGB"], num_trees=4)

'MultiOutputClassifier' object has no attribute 'get_dump'

我该如何解决这个问题？

Answer 1

您只能在 Booster 或 XGBModel 实例上使用 plot_tree 函数。您的第一种情况失败，因为您传递的是 Pipeline 对象，而第二种情况是您传递的是 MultiOutputClassifier 对象。

相反，您必须传递适合的 XGBClassifier 对象。但是，请注意 MultiOutputClassifier 的实际工作原理：

This strategy consists of fitting one classifier per target.

这意味着您将每个标签有一个拟合模型。

您可以使用 MultiOutputClassifier 的 estimators_ 属性访问它们。例如，您可以像这样检索第一个标签的模型：

xgb.plot_tree(pipeline.named_steps["XGB"].estimators_[0], num_trees=4)

如果需要全部，则必须遍历 estimators_ 属性返回的列表。

如何 plot_tree 用于流水线多输出分类器？

How to plot_tree for pipelined MultiOutput Classifier?

python

scikit-learn

xgboost