使用 sklearn 的梯度提升分类器损失函数 - 操作数不能一起广播
Gradient Boosting Classifier Loss Function with sklearn - operands could not be braodcast together
我在使用 sklearn 梯度提升分类器的 estimator.loss_ 方法时遇到问题。我正在尝试将测试误差与一段时间内的训练误差进行比较。这是我的一些数据准备:
# convert data to numpy array
train = np.array(shuffled_ds)
#label encode neighborhoods
for i in range(train.shape[1]):
if i in [1,2]:
print(i,list(train[1:5,i]))
lbl = preprocessing.LabelEncoder()
lbl.fit(list(train[:,i]))
train[:,i] = lbl.transform(train[:,i])
print('neighborhoods & crimes encoded')
#create target vector
y_crimes = train[::,1]
train=np.delete(train,1,1)
print(y_crimes)
#arrays to float
train = train.astype(float)
y_crimes = y_crimes.astype(float)
#data holdout for testing
X_train, X_test, y_train, y_test = cross_validation.train_test_split(
train, y_crimes, test_size=0.4, random_state=0)
print('test data created')
#train model and check train vs test error
print('begin training...')
est=GBC(n_estimators = 3000,learning_rate=.1,max_depth=4,max_features=1,min_samples_leaf=3)
est.fit(X_train,y_train)
print('done training')
此时,当我用
打印出我的数组形状时
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
我得到:
(18000, 9)
(12000, 9)
(18000,)
(12000,)
分别
因此根据 sklearn 文档,我的形状是兼容的。但接下来,我尝试填充一个测试分数向量,这样我就可以将其绘制成图表,以便与我的训练误差进行比较:
test_score=np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test,pred)
我收到以下错误:
: operands could not be broadcast together with shapes (12000,47) (12000,)
return np.sum(-1 * (Y * pred).sum(axis=1) +
543 544else:ValueError
我不确定 47 是从哪里来的。我之前在另一个数据集上使用过相同的程序并且没有问题。任何帮助将不胜感激。
您发出此错误是因为您必须将 staged_decision_function(而不是 staged_predict)方法的结果传递给 loss_
看这里Gradient Boosting regularization
clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)
# compute test set deviance
test_deviance = np.zeros((params['n_estimators'],), dtype=np.float64)
for i, y_pred in enumerate(clf.staged_decision_function(X_test)):
# clf.loss_ assumes that y_test[i] in {0, 1}
test_deviance[i] = clf.loss_(y_test, y_pred)
我在使用 sklearn 梯度提升分类器的 estimator.loss_ 方法时遇到问题。我正在尝试将测试误差与一段时间内的训练误差进行比较。这是我的一些数据准备:
# convert data to numpy array
train = np.array(shuffled_ds)
#label encode neighborhoods
for i in range(train.shape[1]):
if i in [1,2]:
print(i,list(train[1:5,i]))
lbl = preprocessing.LabelEncoder()
lbl.fit(list(train[:,i]))
train[:,i] = lbl.transform(train[:,i])
print('neighborhoods & crimes encoded')
#create target vector
y_crimes = train[::,1]
train=np.delete(train,1,1)
print(y_crimes)
#arrays to float
train = train.astype(float)
y_crimes = y_crimes.astype(float)
#data holdout for testing
X_train, X_test, y_train, y_test = cross_validation.train_test_split(
train, y_crimes, test_size=0.4, random_state=0)
print('test data created')
#train model and check train vs test error
print('begin training...')
est=GBC(n_estimators = 3000,learning_rate=.1,max_depth=4,max_features=1,min_samples_leaf=3)
est.fit(X_train,y_train)
print('done training')
此时,当我用
打印出我的数组形状时print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
我得到:
(18000, 9)
(12000, 9)
(18000,)
(12000,)
分别
因此根据 sklearn 文档,我的形状是兼容的。但接下来,我尝试填充一个测试分数向量,这样我就可以将其绘制成图表,以便与我的训练误差进行比较:
test_score=np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test,pred)
我收到以下错误:
: operands could not be broadcast together with shapes (12000,47) (12000,)
return np.sum(-1 * (Y * pred).sum(axis=1) +
543 544else:ValueError
我不确定 47 是从哪里来的。我之前在另一个数据集上使用过相同的程序并且没有问题。任何帮助将不胜感激。
您发出此错误是因为您必须将 staged_decision_function(而不是 staged_predict)方法的结果传递给 loss_
看这里Gradient Boosting regularization
clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)
# compute test set deviance
test_deviance = np.zeros((params['n_estimators'],), dtype=np.float64)
for i, y_pred in enumerate(clf.staged_decision_function(X_test)):
# clf.loss_ assumes that y_test[i] in {0, 1}
test_deviance[i] = clf.loss_(y_test, y_pred)