Error in training SVM model : Error: One or more factor levels in the outcome has no data: '2'

Error in training SVM model : Error: One or more factor levels in the outcome has no data: '2'

我有以下数据集(给出了前 10 行的样本)

structure(list(variableA = c(11L, 7L, 17L, 7L, 7L, 2L, 
2L, 7L, 7L, 4L), variableB = c(10L, 20L, 4L, 0L, 0L, 1L, 
1L, 0L, 0L, 2L), variableC = c(284L, 
43L, 19L, 0L, 0L, 27L, 27L, 0L, 0L, 20L), variableD = c(299L, 
24L, 28L, 167L, 167L, 27L, 27L, 194L, 194L, 21L), variableE = c(2, 
1, 1, 1, 1, 1, 1, 1, 1, 1), variableF1 = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), variableF2 = c(0L, 
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L), variableF3 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF4 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF5 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF6 = c(1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF7 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF8 = c(0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF9 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF10 = c(0L, 
0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), variableG1 = c(1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableG2 = c(0L, 
0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L), variableG3 = c(0L, 
1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), clusters = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 1L, 6L, 6L), .Label = c("1", "2", "3", 
"4", "5", "6"), class = "factor"), out = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 6L, 6L), .Label = c("3", "1", "2", "4", 
"5", "6"), class = "factor")), row.names = c(1L, 3L, 4L, 5L, 
6L, 8L, 9L, 12L, 13L, 14L), class = "data.frame")

我一直在尝试在此数据集上使用支持向量机算法,之前它运行良好,但由于某种原因现在出现错误。

我正在尝试的型号是

set.seed(111)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
svm_Linear <- train(out~`variableA`                   + `variableB`      +              
                      `variableC` +`variableD`+
                      `variableE`                           +`variableF1`            +
                      `variableF2`            + `variableF3`           +
                      `variableF4`            + `variableF5`           + 
                      `variableF6`            + `variableF7`           + 
                      `variableF8`            + `variableF9`           + 
                      `variableF10`            + `variableG1`                  + 
                      `variableG2`                   + `variableG3`  , data= train, method = "svmLinear",
                    trControl=trctrl,
                    preProcess = c("center", "scale"),
                    tuneLength = 10)
svm_Linear

但是我收到了这个我无法理解的错误。

Error: One or more factor levels in the outcome has no data: '2'

我在这个网站上看到了类似的 post,但是 none 有我需要的答案

您的 out 列是一个具有 6 个级别的因素,但您在 post 中提供的 dput 中只代表了 3 个级别 - 这就是您收到此错误的原因。

levels(train$out)
# "3" "1" "2" "4" "5" "6"

unique(train$out)
# 3 1 6
# Levels: 3 1 2 4 5 6

这可能是由于您执行 train/test 拆分的方式所致。

您可以重新定义 levels(out) 以仅包含 c(1, 3, 6),但如果您的测试数据包含其他响应级别,这将是一个问题。

考虑改用分层抽样方法,以确保您的响应变量在 train/test 拆分中正确表示。关于分层抽样的问题更适合 Cross Validated than for Stack Overflow, but there are some good starting points mentioned in this SO post and this one.