如何使用 for 循环在决策树上正确实施装袋?
How to properly implement bagging on decision tree with for loop?
我正在尝试使用决策树和 for 循环来实现装袋和投票。我正在使用 sklearn 重采样。但是,我得到 Number of labels=97 does not match number of samples=77
并且我可以理解为什么,但我不确定如何解决它。
数据集中有 150 个样本。
有150个标签
所以 150 * 0.35 = 97
和 97 * 0.8 = 77。
X是长度为150的特征矩阵,并且
y是长度为150
的标签向量
下面是我的代码
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.utils import resample
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=3)
predictions = []
for i in range(1,20):
bootstrap_size = int(0.8*len(X_train))
bag = resample(X_train, n_samples = bootstrap_size , random_state=i , replace = True)
Base_DecisionTree = DecisionTreeClassifier(random_state=3)
Base_DecisionTree.fit(bag, y_train)
y_predict = Base_DecisionTree.predict(X_test)
accuracy = accuracy_score(y_test, y_predict)
predictions.append(accuracy)
您还应该对标签重新采样并在 fit()
中使用它。
x_bag, y_bag = resample(X_train, y_train, n_samples = bootstrap_size , random_state=i , replace = True)
tree = DecisionTreeClassifier(random_state=3)
tree.fit(x_bag, y_bag)
我正在尝试使用决策树和 for 循环来实现装袋和投票。我正在使用 sklearn 重采样。但是,我得到 Number of labels=97 does not match number of samples=77
并且我可以理解为什么,但我不确定如何解决它。
数据集中有 150 个样本。 有150个标签 所以 150 * 0.35 = 97 和 97 * 0.8 = 77。 X是长度为150的特征矩阵,并且 y是长度为150
的标签向量下面是我的代码
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.utils import resample
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=3)
predictions = []
for i in range(1,20):
bootstrap_size = int(0.8*len(X_train))
bag = resample(X_train, n_samples = bootstrap_size , random_state=i , replace = True)
Base_DecisionTree = DecisionTreeClassifier(random_state=3)
Base_DecisionTree.fit(bag, y_train)
y_predict = Base_DecisionTree.predict(X_test)
accuracy = accuracy_score(y_test, y_predict)
predictions.append(accuracy)
您还应该对标签重新采样并在 fit()
中使用它。
x_bag, y_bag = resample(X_train, y_train, n_samples = bootstrap_size , random_state=i , replace = True)
tree = DecisionTreeClassifier(random_state=3)
tree.fit(x_bag, y_bag)