混淆矩阵未显示正确的实际值计数。多项式回归,因素

Confusion Matrix doesn't show correct count of actual values. Multinomial regression, factors

我有两个向量,实际值和预测值。两者都是具有 8 个水平的因子类型。第 8 级实际只有 55 个观测值,预测为 0 个。但是,当我制作混淆矩阵时,第 8 级观察结果消失或以某种方式移动。实际总和的列不应该与其实际计数相加吗?

我制作了两种不同的混淆矩阵来仔细检查。我还尝试明确地使两个向量中的因子水平相同。到目前为止没有运气。

library(nnet); library(caret)

sc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv")

# First column is ID
sc$LeagueIndex <- as.factor(sc$LeagueIndex)
sc <- sc[, -1]

# Set missing values to NA
which_qm <- sc[, c(2,3,4)] == '?'
sc[, c(2,3,4)][which_qm] <- NA
sc[, c(2,3,4)] <- apply(sc[, c(2,3,4)], 2, as.numeric)

# Set impossible values to NA
sc$TotalHours[sc$Age < sc$TotalHours/8760] <- NA
sc$HoursPerWeek[sc$HoursPerWeek >= 168] <- NA

# Fit model and store predictions
sc_mod1 <- multinom(LeagueIndex ~ ., sc)
sc_fitted1 <- predict(sc_mod1, sc)

# sc_fitted1 is missing factor level 8
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)

# sc_fitted1 has factor level 8
levels(sc_fitted1) <- levels(sc$LeagueIndex)
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)

# What's the problem?
table(sc$LeagueIndex)
length(sc$LeagueIndex)

table(sc_fitted1)
length(sc_fitted1)

这与您生成的 NA 值有关,它们都是针对目标变量的第 8 级。如果您希望将 8 级考虑在内,您可能必须找到另一种方法来对这些 NA 进行编码。

试试这个作为反例:

library(nnet); library(caret)

sc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv")

sc$LeagueIndex <- as.factor(sc$LeagueIndex)
sc <- sc[, -1]

which_qm <- sc[, c(2,3,4)] == '?'
sc[, c(2,3,4)][which_qm] <- 20   # this is just a random numeric value (not the best one to use!)
sc[, c(2,3,4)] <- apply(sc[, c(2,3,4)], 2, as.numeric)

sc_mod1 <- multinom(LeagueIndex ~ ., sc)
sc_fitted1 <- predict(sc_mod1, sc)

confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)

它会给你这样的东西:

         actual
predicted   1   2   3   4   5   6   7   8
        1  52  30   9   2   0   0   0   0
        2  61 123  78  58   4   1   0   0
        3  30  77 142  79  23   4   0   0
        4  21 104 248 410 252  45   0   0
        5   2  11  60 217 343 230   1   0
        6   1   2  16  45 184 333  32   2
        7   0   0   0   0   0   5   2   0
        8   0   0   0   0   0   3   0  53