RandomForestClassifier 为多标签类提供转置输出

Question

出于某种原因，每当我运行一个 ensemble.RandomForestClassifier() 并使用 .predict_proba() 方法时，它 returns 一个形状为 [=15= 的二维数组] 而不是它应该 per the docs.

的 [n_samples, n_classes] 形状

这是我的示例代码：

# generate some sample data

X = np.array([[4, 5, 6, 7, 8], 
              [0, 5, 6, 2, 3], 
              [1, 2, 6, 5, 8], 
              [6, 1, 1, 1, 3], 
              [2, 5, 3, 2, 0]])
»» X.shape
   (5, 5)

y = [['blue', 'red'], 
     ['red'], 
     ['red', 'green'], 
     ['blue', 'green'], 
     ['orange']]

X_test = np.array([[4, 6, 1, 2, 8], 
                   [0, 0, 1, 5, 1]])
»» X_test.shape
   (2, 5)

# binarize text labels

mlb = preprocessing.MultiLabelBinarizer()
lb_y = mlb.fit_transform(y)

»» lb_y 
   [[1 0 0 1]
    [0 0 0 1]
    [0 1 0 1]
    [1 1 0 0]
    [0 0 1 0]]

»» lb_y.shape
   (5, 4)

到目前为止一切正常。但是当我这样做时：

rfc = ensemble.RandomForestClassifier(random_state=42)
rfc.fit(X, lb_y)
yhat_p = rfc.predict_proba(X_test)

»» yhat_p
[array([[ 0.5,  0.5],
        [ 0.7,  0.3]]), 
 array([[ 0.4,  0.6],
        [ 0.5,  0.5]]), 
 array([[ 0.7,  0.3],
        [ 0.7,  0.3]]), 
 array([[ 0.7,  0.3],
        [ 0.6,  0.4]])]

我的 yhat_p 尺码是 [n_classes, n_samples] 而不是 [n_samples, n_classes]。有人能告诉我为什么我的输出被转置了吗？注意：.predict() 方法工作得很好。

Answer 1

通过对数据进行二值化，您已经转换了问题，因此您现在可以执行四个单独的分类任务。这些任务中的每一个都有两个类、0 和 1，其中 1 代表 "has this label"，0 代表 "doesn't have this label")。

文档中的格式有点奇怪，但它表示：

array of shape = [n_samples, n_classes], or a list of n_outputs such arrays if n_outputs > 1

由于您有四个输出，因此您会得到一个包含四个数组的列表。每个数组的形状都是 (2, 2)，因为每个输出有两个样本（即 X_test 中的两行）和两个类（0 和 1）。文档中提到的 n_classes 是单个输出类 的数量 ，而不是所有输出分类中类的总数正在做。（returns 列表而不是单个数组的原因是不需要单独的分类具有相同数量的类。您可以执行多输出分类任务，其中一个输出有两个类另一个有 100 类.)

例如，列表中的第一个元素是

array([[ 0.5,  0.5],
        [ 0.7,  0.3]]),

每一行给你 X_test 的相应行属于第一个分类任务中每个类的概率，这基本上是 "Is this item blue or not?" 因此第一行告诉你第一行 X_test 有 50% 的可能性不是蓝色，有 50% 的可能性是蓝色；第二行告诉你第二行 X_test 有 70% 的几率不是蓝色，有 30% 的几率是蓝色。

RandomForestClassifier 为多标签类提供转置输出

RandomForestClassifier give transposed output for multi-label classes

python

random-forest

scikit-learn

RandomForestClassifier 为多标签 类 提供转置输出

RandomForestClassifier give transposed output for multi-label classes

python

random-forest

scikit-learn

RandomForestClassifier 为多标签类提供转置输出