使用高级索引时形状不匹配

Question

我正在开发一个自定义 classifier，它像一个整体，将次要 classifier 组合在一起，整体的输出实际上是多数表决。值得一提的是，每个 classifier 都有一个与每个样本相关联的“权重”。

这是predict方法：

def predict(self, X):        
    
    G = self._compute_g(X) # G comes from a softmax distribution
    pred = np.zeros( (len(self._estimators), X.shape[0]), dtype=int ) 
    
    for i, estimator in enumerate(self._estimators): #loop each minor classifier
        y_est = estimator.predict(X)    
        pred[i] = y_est

    pred = pred.T # Prediction matrix (samples x classifiers)
    C = len(self._classes) # number of classes of the dataset
    M, N = pred.shape

    row, col = np.indices((M,N))
    P3d = np.zeros(shape=(M,N,C))
    P3d[row, col, pred-1] = G
    P = P3d.sum(axis=1)
    return np.argmax(P, axis=1)

对于多数表决，我创建了一个 P 矩阵（样本 x n_classes），它对在给定 class 中投票的 classifier 权重求和。例如：假设我们有 3 个 classifier 试图预测 3-class 问题的样本 k。分类器权重为 [0.3, 0.4, 0.6]，预测为 [1,1,2]。矩阵 P 的第 k 行将为 [0.7, 0.6, 0]，集成 class 运算符的输出将为 1.

问题是我正在尝试使用高级索引来构建矩阵 P3d（用于构建矩阵 P），并且在尝试预测 Iris 数据集时出现以下错误:

ValueError: shape mismatch: value array of shape (150,6) could not be broadcast to indexing result of shape (150,3)

这个错误来自这一行：P3d[row, col, pred-1] = G，但我不知道是什么导致了这个问题。

涉及的矩阵形状

G: n_samples x n_classifiers
pred (M,N): n_samples x n_classifiers
P: n_samples x n_classes
return 函数（最后一行）：n_samples x 1

Answer 1

在没有看到完全可重现的代码的情况下，很难判断 G = self._compute_g(X) 在做什么。但是，返回值 G 的形状似乎是 (150, 6)，而不是预期的 (150, 3)。因此你会得到形状不匹配错误。

我建议仔细检查 G 以确认 self._compute_g(X) 是否符合您的预期。

作为旁注，明智地使用 assert 来确认各种数组的形状可以帮助捕获许多此类错误。即 assert G.shape == (M, N)

使用高级索引时形状不匹配

Mismatch of shapes when using advanced indexing

python

numpy

scikit-learn

numpy-ndarray

array-broadcasting