为什么 AdaBoost 不能与 DecisionTree 一起工作？

Question

我将 sklearn 0.19.1 与 DecisionTree 和 AdaBoost 一起使用。

我有一个工作正常的 DecisionTree 分类器：

clf = tree.DecisionTreeClassifier()

train_split_perc = 10000
test_split_perc = pdf.shape[0] - train_split_perc

train_pdf_x = pdf[:train_split_perc]
train_pdf_y = YY[:train_split_perc]

test_pdf_x = pdf[-test_split_perc:]
test_pdf_y = YY[-test_split_perc:]

clf.fit(train_pdf_x, train_pdf_y)

pred2 = clf.predict(test_pdf_x)

但是当尝试添加 AdaBoost 时，它会在预测函数上引发错误：

treeclf = tree.DecisionTreeClassifier(max_depth=3)
adaclf = AdaBoostClassifier(base_estimator=treeclf, n_estimators=500, learning_rate=0.5)

train_split_perc = 10000
test_split_perc = pdf.shape[0] - train_split_perc

train_pdf_x = pdf[:train_split_perc]
train_pdf_y = YY[:train_split_perc]

test_pdf_x = pdf[-test_split_perc:]
test_pdf_y = YY[-test_split_perc:]

adaclf.fit(train_pdf_x, train_pdf_y)

pred2 = adaclf.predict(test_pdf_x)

具体错误说：

ValueError: bad input shape (236821, 6)

它似乎指向的数据集是train_pdf_y，因为它的形状是(236821, 6)，我不明白为什么。

甚至从 AdaBoostClassifier 的描述中 in the docs 我可以理解使用数据的实际分类器是 DecisionTree:

An AdaBoost 1 classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases

但我仍然收到此错误。

在 code examples I've found 中，即使在 sklearn 的网站上有关于如何使用 AdaBoost 的信息，我也无法理解我做错了什么。

感谢任何帮助。

Answer 1

看起来你正在尝试执行 Multi-Output classification problem，给定 y 的形状，否则你正在喂养和 n 维 y 是没有意义的adaclf.fit(train_pdf_x, train_pdf_y).

所以假设是这种情况，问题确实是 Scikit-Learn 的 DecisionTreeClassifier does support Multi-output problems, this is, y inputs with shape [n_samples, n_outputs]. However that is not the case for the AdaBoostClassifier，因为根据文档，标签必须是：

y : array-like of shape = [n_samples]

为什么 AdaBoost 不能与 DecisionTree 一起工作？

Why does AdaBoost not work with DecisionTree?

python

machine-learning

decision-tree

adaboost

scikit-learn