
How to maximize recall in multilabel setting?

我有一个文本分类问题,我想将三个标签 (-1, 0, 1) 之一分配给文本文档。最重要的指标是召回率:我关心所有应该标记为“-1”的文本确实标记为“-1”。精度,即标记为“-1”的所有内容确实标记为“-1”,并不那么重要。

到目前为止,我在 scikit-learn 中使用逻辑回归流水线。超参数在 GridSearchCV 中进行了调整,但到目前为止,准确性已最大化。

steps = [('vect', CountVectorizer()),
      ('tfidf', TfidfTransformer()), 
      ('clf', LogisticRegression())]

parameters = {'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (1, 4)],
           'tfidf__use_idf': (True, False),
           'clf__C': [0.001, 0.01, 0.1, 1, 10],}

pipeline = Pipeline(steps)
text_clf = GridSearchCV(pipeline, parameters, cv = 5)

text_clf.fit(X_train, y_train)
y_pred = text_clf.predict(X_test)

scores = cross_val_score(text_clf, X_test, y_test, cv = 5)


text_clf = GridSearchCV(pipeline, parameters, scoring = 'recall', cv = 5)


如果指标仅提供一个数字作为 GridSearchCV 将用于对结果排序的分数,则 grid-search 可以工作。

在 multi-label 设置的情况下,您需要为不同的标签决定您想要哪种类型的平均。您可以使用以下替代方法:

scoring = 'recall_micro'
scoring = 'recall_macro'
scoring = 'recall_weighted'
scoring = 'recall_samples'

这些的说明请参考documentation of recall_score:

average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

    This parameter is required for multiclass/multilabel targets. 
    If None, the scores for each class are returned. Otherwise, this
    determines the type of averaging performed on the data:

        Only report results for the class specified by pos_label. 
        This is applicable only if targets (y_{true,pred}) are binary.

        Calculate metrics globally by counting the total true positives, 
        false negatives and false positives.

        Calculate metrics for each label, and find their unweighted mean. 
        This does not take label imbalance into account.

        Calculate metrics for each label, and find their average, weighted 
        by support (the number of true instances for each label).
        This alters ‘macro’ to account for label imbalance; it can result in
        an F-score that is not between precision and recall.

        Calculate metrics for each instance, and find their average 
        (only meaningful for multilabel classification where this
        differs from accuracy_score).