SciKit Learn 中的多 Class 逻辑回归
Multi-Class Logistic Regression in SciKit Learn
我在为多 class 案例正确调用 Scikit 逻辑回归时遇到问题。我正在使用 lbgfs 求解器,并且我确实将 multi_class 参数设置为多项式。
我不清楚如何在拟合模型时传递真实的 class 标签。我假设它是 similar/same 至于随机森林 classifier multi-class,你在其中传递 [n_samples,m_classes] 数据帧。但是,在这样做时,我得到一个错误,指出数据形状不佳。 ValueError: bad input shape (20, 5) -- 在这个小例子中,有 5 classes,20 个样本。
经检查,fit 方法的文档表明真值作为 [n_samples, ] 传递——这与我得到的错误相符——但是,我不知道如何传递用多个 classes 训练模型。所以,这是我的问题:如何将全套 class 标签传递给拟合函数?
我一直无法在 Internet 上找到用于建模的示例代码,也无法在 Whosebug 上找到这个问题。但我觉得肯定有人知道如何去做!
在下面的代码中,train_features = [n_samples, nn_features], truth_train = [n_samples, m_classes]
clf = LogisticRegressionCV(class_weight='balanced', multi_class='multinomial', solver='lbfgs')
clf.fit(train_features, truth_train)
pred = clf.predict(test_features)
您似乎混淆了术语 multiclass 和 multilabel http://scikit-learn.org/stable/modules/multiclass.html ,简而言之:
- Multiclass classification means a classification task with more than
two classes; e.g., classify a set of images of fruits which may be
oranges, apples, or pears. Multiclass classification makes the
assumption that each sample is assigned to one and only one label: a
fruit can be either an apple or a pear but not both at the same time.
因此数据为 [n_samples, n_features]
,标签为 [n_samples]
- Multilabel classification assigns to each sample a set of target
labels. This can be thought as predicting properties of a data-point
that are not mutually exclusive, such as topics that are relevant for
a document. A text might be about any of religion, politics, finance
or education at the same time or none of these.
因此数据为 [n_samples, n_features]
,标签为 [n_samples, n_labels]
而且您似乎在寻找多标签(至于多类标签应该是 1-dim)。目前在sklearn中,支持多标签的方法只有:决策树、随机森林、最近邻、岭回归。
如果您想学习不同模型的多标签问题,只需使用 OneVsRestClassifier
作为 LogisticRegression
的多标签包装器
我在为多 class 案例正确调用 Scikit 逻辑回归时遇到问题。我正在使用 lbgfs 求解器,并且我确实将 multi_class 参数设置为多项式。
我不清楚如何在拟合模型时传递真实的 class 标签。我假设它是 similar/same 至于随机森林 classifier multi-class,你在其中传递 [n_samples,m_classes] 数据帧。但是,在这样做时,我得到一个错误,指出数据形状不佳。 ValueError: bad input shape (20, 5) -- 在这个小例子中,有 5 classes,20 个样本。
经检查,fit 方法的文档表明真值作为 [n_samples, ] 传递——这与我得到的错误相符——但是,我不知道如何传递用多个 classes 训练模型。所以,这是我的问题:如何将全套 class 标签传递给拟合函数?
我一直无法在 Internet 上找到用于建模的示例代码,也无法在 Whosebug 上找到这个问题。但我觉得肯定有人知道如何去做!
在下面的代码中,train_features = [n_samples, nn_features], truth_train = [n_samples, m_classes]
clf = LogisticRegressionCV(class_weight='balanced', multi_class='multinomial', solver='lbfgs')
clf.fit(train_features, truth_train)
pred = clf.predict(test_features)
您似乎混淆了术语 multiclass 和 multilabel http://scikit-learn.org/stable/modules/multiclass.html ,简而言之:
- Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.
因此数据为 [n_samples, n_features]
,标签为 [n_samples]
- Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.
因此数据为 [n_samples, n_features]
,标签为 [n_samples, n_labels]
而且您似乎在寻找多标签(至于多类标签应该是 1-dim)。目前在sklearn中,支持多标签的方法只有:决策树、随机森林、最近邻、岭回归。
如果您想学习不同模型的多标签问题,只需使用 OneVsRestClassifier
作为 LogisticRegression