使用 F1 分数作为多类预测的指标

Use F1 Score as metric for multiclass prediction

我已经找到了这个Training Model in Caret Using F1 Metric,它描述了如何使用自定义 summaryFunction 将 F1 用作指标。

然而,这仅适用于二进制 classifications。我想将它用于多 class 数据集。

到目前为止我所做的如下:

f1 <- function(data, lev = NULL, model = NULL) {
    print(data)
    precision <- posPredValue(data$pred, data$obs, positive = "pass")
    recall <- sensitivity(data$pred, data$obs, positive = "pass")
    f1_val <- (2*precision*recall) / (precision + recall)
    names(f1_val) <- c("F1")
    f1_val
}


train.control <- trainControl(method = "repeatedcv",
                              number = 2,
                              summaryFunction = defaultSummary,
                              classProbs = TRUE, 
                              search = "grid")
                              
tune.grid <- expand.grid(.mtry = seq(from = 5, to = 10, by = 1))
                              
random.forest.orig <- train(target~.,
                            data = data.small,
                            method = "rf",
                            tuneGrid = tune.grid,
                            metric = "F1",
                            trControl = train.control)

random.forest.orig

正如预期的那样,它会产生以下错误:

Error in posPredValue.default(data$pred, data$obs, positive = "pass") : input data must have the same two levels

我希望有人已经这样做了,可以帮助我解决这个问题。不然我也纳闷为什么f1函数中使用的dataframe只包含10行...

解决方法:

f1 <- function(data, lev = NULL, model = NULL) {
    f1_val <- f1_score(data$pred,data$obs)
    names(f1_val) <- c("F1")
    f1_val
}

f1_score <- function(predicted, expected, positive.class="1") {
    predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
    expected  <- as.factor(expected)
    cm = as.matrix(table(expected, predicted))

    precision <- diag(cm) / colSums(cm)
    recall <- diag(cm) / rowSums(cm)
    f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))

    #Assuming that F1 is zero when it's not possible compute it
    f1[is.na(f1)] <- 0

    #Binary F1 or Multi-class macro-averaged F1
    ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}


train.control <- trainControl(method = "cv",
                              number = 2,
                              summaryFunction = f1,
                              classProbs = TRUE, 
                              search = "grid")
                              
tune.grid <- expand.grid(.mtry = seq(from = 10, to = 15, by = 1))
                              
random.forest.orig <- train(target~.,
                            data = data.small,
                            method = "rf",
                            tuneGrid = tune.grid,
                            metric = "F1",
                            trControl = train.control)

random.forest.orig

希望对大家有所帮助