SciKit Learn 中的多 Class 逻辑回归

Question

我在为多 class 案例正确调用 Scikit 逻辑回归时遇到问题。我正在使用 lbgfs 求解器，并且我确实将 multi_class 参数设置为多项式。

我不清楚如何在拟合模型时传递真实的 class 标签。我假设它是 similar/same 至于随机森林 classifier multi-class，你在其中传递 [n_samples，m_classes] 数据帧。但是，在这样做时，我得到一个错误，指出数据形状不佳。 ValueError: bad input shape (20, 5) -- 在这个小例子中，有 5 classes，20 个样本。

经检查，fit 方法的文档表明真值作为 [n_samples, ] 传递——这与我得到的错误相符——但是，我不知道如何传递用多个 classes 训练模型。所以，这是我的问题：如何将全套 class 标签传递给拟合函数？

我一直无法在 Internet 上找到用于建模的示例代码，也无法在 Whosebug 上找到这个问题。但我觉得肯定有人知道如何去做！

在下面的代码中，train_features = [n_samples, nn_features], truth_train = [n_samples, m_classes]

clf = LogisticRegressionCV(class_weight='balanced', multi_class='multinomial', solver='lbfgs')
clf.fit(train_features, truth_train)
pred = clf.predict(test_features)

Answer 1

您似乎混淆了术语 multiclass 和 multilabel http://scikit-learn.org/stable/modules/multiclass.html ，简而言之：

Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.

因此数据为 [n_samples, n_features]，标签为 [n_samples]

Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

因此数据为 [n_samples, n_features]，标签为 [n_samples, n_labels]

而且您似乎在寻找多标签（至于多类标签应该是 1-dim）。目前在sklearn中，支持多标签的方法只有：决策树、随机森林、最近邻、岭回归。

如果您想学习不同模型的多标签问题，只需使用 OneVsRestClassifier 作为 LogisticRegression

的多标签包装器

http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier

SciKit Learn 中的多 Class 逻辑回归

Multi-Class Logistic Regression in SciKit Learn

python

machine-learning

scikit-learn

logistic-regression