如何在多标签设置中最大化召回?
How to maximize recall in multilabel setting?
我有一个文本分类问题,我想将三个标签 (-1, 0, 1) 之一分配给文本文档。最重要的指标是召回率:我关心所有应该标记为“-1”的文本确实标记为“-1”。精度,即标记为“-1”的所有内容确实标记为“-1”,并不那么重要。
到目前为止,我在 scikit-learn 中使用逻辑回归流水线。超参数在 GridSearchCV 中进行了调整,但到目前为止,准确性已最大化。
steps = [('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', LogisticRegression())]
parameters = {'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (1, 4)],
'tfidf__use_idf': (True, False),
'clf__C': [0.001, 0.01, 0.1, 1, 10],}
pipeline = Pipeline(steps)
text_clf = GridSearchCV(pipeline, parameters, cv = 5)
text_clf.fit(X_train, y_train)
y_pred = text_clf.predict(X_test)
scores = cross_val_score(text_clf, X_test, y_test, cv = 5)
改变
text_clf = GridSearchCV(pipeline, parameters, scoring = 'recall', cv = 5)
不起作用,因为它是多职业设置。有谁知道我该如何重新表述以最大限度地提高召回率?
如果指标仅提供一个数字作为 GridSearchCV 将用于对结果排序的分数,则 grid-search 可以工作。
在 multi-label 设置的情况下,您需要为不同的标签决定您想要哪种类型的平均。您可以使用以下替代方法:
scoring = 'recall_micro'
scoring = 'recall_macro'
scoring = 'recall_weighted'
scoring = 'recall_samples'
这些的说明请参考documentation of recall_score:
average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]
This parameter is required for multiclass/multilabel targets.
If None, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
'binary':
Only report results for the class specified by pos_label.
This is applicable only if targets (y_{true,pred}) are binary.
'micro':
Calculate metrics globally by counting the total true positives,
false negatives and false positives.
'macro':
Calculate metrics for each label, and find their unweighted mean.
This does not take label imbalance into account.
'weighted':
Calculate metrics for each label, and find their average, weighted
by support (the number of true instances for each label).
This alters ‘macro’ to account for label imbalance; it can result in
an F-score that is not between precision and recall.
'samples':
Calculate metrics for each instance, and find their average
(only meaningful for multilabel classification where this
differs from accuracy_score).
我有一个文本分类问题,我想将三个标签 (-1, 0, 1) 之一分配给文本文档。最重要的指标是召回率:我关心所有应该标记为“-1”的文本确实标记为“-1”。精度,即标记为“-1”的所有内容确实标记为“-1”,并不那么重要。
到目前为止,我在 scikit-learn 中使用逻辑回归流水线。超参数在 GridSearchCV 中进行了调整,但到目前为止,准确性已最大化。
steps = [('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', LogisticRegression())]
parameters = {'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (1, 4)],
'tfidf__use_idf': (True, False),
'clf__C': [0.001, 0.01, 0.1, 1, 10],}
pipeline = Pipeline(steps)
text_clf = GridSearchCV(pipeline, parameters, cv = 5)
text_clf.fit(X_train, y_train)
y_pred = text_clf.predict(X_test)
scores = cross_val_score(text_clf, X_test, y_test, cv = 5)
改变
text_clf = GridSearchCV(pipeline, parameters, scoring = 'recall', cv = 5)
不起作用,因为它是多职业设置。有谁知道我该如何重新表述以最大限度地提高召回率?
如果指标仅提供一个数字作为 GridSearchCV 将用于对结果排序的分数,则 grid-search 可以工作。
在 multi-label 设置的情况下,您需要为不同的标签决定您想要哪种类型的平均。您可以使用以下替代方法:
scoring = 'recall_micro'
scoring = 'recall_macro'
scoring = 'recall_weighted'
scoring = 'recall_samples'
这些的说明请参考documentation of recall_score:
average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data: 'binary': Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary. 'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives. 'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. 'weighted': Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).