使用 F1 分数作为多类预测的指标
Use F1 Score as metric for multiclass prediction
我已经找到了这个Training Model in Caret Using F1 Metric,它描述了如何使用自定义 summaryFunction 将 F1 用作指标。
然而,这仅适用于二进制 classifications。我想将它用于多 class 数据集。
到目前为止我所做的如下:
f1 <- function(data, lev = NULL, model = NULL) {
print(data)
precision <- posPredValue(data$pred, data$obs, positive = "pass")
recall <- sensitivity(data$pred, data$obs, positive = "pass")
f1_val <- (2*precision*recall) / (precision + recall)
names(f1_val) <- c("F1")
f1_val
}
train.control <- trainControl(method = "repeatedcv",
number = 2,
summaryFunction = defaultSummary,
classProbs = TRUE,
search = "grid")
tune.grid <- expand.grid(.mtry = seq(from = 5, to = 10, by = 1))
random.forest.orig <- train(target~.,
data = data.small,
method = "rf",
tuneGrid = tune.grid,
metric = "F1",
trControl = train.control)
random.forest.orig
正如预期的那样,它会产生以下错误:
Error in posPredValue.default(data$pred, data$obs, positive = "pass") : input data must have the same two levels
我希望有人已经这样做了,可以帮助我解决这个问题。不然我也纳闷为什么f1函数中使用的dataframe只包含10行...
解决方法:
f1 <- function(data, lev = NULL, model = NULL) {
f1_val <- f1_score(data$pred,data$obs)
names(f1_val) <- c("F1")
f1_val
}
f1_score <- function(predicted, expected, positive.class="1") {
predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
expected <- as.factor(expected)
cm = as.matrix(table(expected, predicted))
precision <- diag(cm) / colSums(cm)
recall <- diag(cm) / rowSums(cm)
f1 <- ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))
#Assuming that F1 is zero when it's not possible compute it
f1[is.na(f1)] <- 0
#Binary F1 or Multi-class macro-averaged F1
ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}
train.control <- trainControl(method = "cv",
number = 2,
summaryFunction = f1,
classProbs = TRUE,
search = "grid")
tune.grid <- expand.grid(.mtry = seq(from = 10, to = 15, by = 1))
random.forest.orig <- train(target~.,
data = data.small,
method = "rf",
tuneGrid = tune.grid,
metric = "F1",
trControl = train.control)
random.forest.orig
希望对大家有所帮助
我已经找到了这个Training Model in Caret Using F1 Metric,它描述了如何使用自定义 summaryFunction 将 F1 用作指标。
然而,这仅适用于二进制 classifications。我想将它用于多 class 数据集。
到目前为止我所做的如下:
f1 <- function(data, lev = NULL, model = NULL) {
print(data)
precision <- posPredValue(data$pred, data$obs, positive = "pass")
recall <- sensitivity(data$pred, data$obs, positive = "pass")
f1_val <- (2*precision*recall) / (precision + recall)
names(f1_val) <- c("F1")
f1_val
}
train.control <- trainControl(method = "repeatedcv",
number = 2,
summaryFunction = defaultSummary,
classProbs = TRUE,
search = "grid")
tune.grid <- expand.grid(.mtry = seq(from = 5, to = 10, by = 1))
random.forest.orig <- train(target~.,
data = data.small,
method = "rf",
tuneGrid = tune.grid,
metric = "F1",
trControl = train.control)
random.forest.orig
正如预期的那样,它会产生以下错误:
Error in posPredValue.default(data$pred, data$obs, positive = "pass") : input data must have the same two levels
我希望有人已经这样做了,可以帮助我解决这个问题。不然我也纳闷为什么f1函数中使用的dataframe只包含10行...
解决方法:
f1 <- function(data, lev = NULL, model = NULL) {
f1_val <- f1_score(data$pred,data$obs)
names(f1_val) <- c("F1")
f1_val
}
f1_score <- function(predicted, expected, positive.class="1") {
predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))
expected <- as.factor(expected)
cm = as.matrix(table(expected, predicted))
precision <- diag(cm) / colSums(cm)
recall <- diag(cm) / rowSums(cm)
f1 <- ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))
#Assuming that F1 is zero when it's not possible compute it
f1[is.na(f1)] <- 0
#Binary F1 or Multi-class macro-averaged F1
ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1))
}
train.control <- trainControl(method = "cv",
number = 2,
summaryFunction = f1,
classProbs = TRUE,
search = "grid")
tune.grid <- expand.grid(.mtry = seq(from = 10, to = 15, by = 1))
random.forest.orig <- train(target~.,
data = data.small,
method = "rf",
tuneGrid = tune.grid,
metric = "F1",
trControl = train.control)
random.forest.orig
希望对大家有所帮助