Error in training SVM model : Error: One or more factor levels in the outcome has no data: '2'
Error in training SVM model : Error: One or more factor levels in the outcome has no data: '2'
我有以下数据集(给出了前 10 行的样本)
structure(list(variableA = c(11L, 7L, 17L, 7L, 7L, 2L,
2L, 7L, 7L, 4L), variableB = c(10L, 20L, 4L, 0L, 0L, 1L,
1L, 0L, 0L, 2L), variableC = c(284L,
43L, 19L, 0L, 0L, 27L, 27L, 0L, 0L, 20L), variableD = c(299L,
24L, 28L, 167L, 167L, 27L, 27L, 194L, 194L, 21L), variableE = c(2,
1, 1, 1, 1, 1, 1, 1, 1, 1), variableF1 = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), variableF2 = c(0L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L), variableF3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF4 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF5 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF6 = c(1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF7 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF8 = c(0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF9 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF10 = c(0L,
0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), variableG1 = c(1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableG2 = c(0L,
0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L), variableG3 = c(0L,
1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), clusters = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 1L, 6L, 6L), .Label = c("1", "2", "3",
"4", "5", "6"), class = "factor"), out = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 6L, 6L), .Label = c("3", "1", "2", "4",
"5", "6"), class = "factor")), row.names = c(1L, 3L, 4L, 5L,
6L, 8L, 9L, 12L, 13L, 14L), class = "data.frame")
我一直在尝试在此数据集上使用支持向量机算法,之前它运行良好,但由于某种原因现在出现错误。
我正在尝试的型号是
set.seed(111)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
svm_Linear <- train(out~`variableA` + `variableB` +
`variableC` +`variableD`+
`variableE` +`variableF1` +
`variableF2` + `variableF3` +
`variableF4` + `variableF5` +
`variableF6` + `variableF7` +
`variableF8` + `variableF9` +
`variableF10` + `variableG1` +
`variableG2` + `variableG3` , data= train, method = "svmLinear",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
svm_Linear
但是我收到了这个我无法理解的错误。
Error: One or more factor levels in the outcome has no data: '2'
我在这个网站上看到了类似的 post,但是 none 有我需要的答案
您的 out
列是一个具有 6 个级别的因素,但您在 post 中提供的 dput
中只代表了 3 个级别 - 这就是您收到此错误的原因。
levels(train$out)
# "3" "1" "2" "4" "5" "6"
unique(train$out)
# 3 1 6
# Levels: 3 1 2 4 5 6
这可能是由于您执行 train/test 拆分的方式所致。
您可以重新定义 levels(out)
以仅包含 c(1, 3, 6)
,但如果您的测试数据包含其他响应级别,这将是一个问题。
考虑改用分层抽样方法,以确保您的响应变量在 train/test 拆分中正确表示。关于分层抽样的问题更适合 Cross Validated than for Stack Overflow, but there are some good starting points mentioned in this SO post and this one.
我有以下数据集(给出了前 10 行的样本)
structure(list(variableA = c(11L, 7L, 17L, 7L, 7L, 2L,
2L, 7L, 7L, 4L), variableB = c(10L, 20L, 4L, 0L, 0L, 1L,
1L, 0L, 0L, 2L), variableC = c(284L,
43L, 19L, 0L, 0L, 27L, 27L, 0L, 0L, 20L), variableD = c(299L,
24L, 28L, 167L, 167L, 27L, 27L, 194L, 194L, 21L), variableE = c(2,
1, 1, 1, 1, 1, 1, 1, 1, 1), variableF1 = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), variableF2 = c(0L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L), variableF3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF4 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF5 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF6 = c(1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF7 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF8 = c(0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF9 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableF10 = c(0L,
0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), variableG1 = c(1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), variableG2 = c(0L,
0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L), variableG3 = c(0L,
1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), clusters = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 1L, 6L, 6L), .Label = c("1", "2", "3",
"4", "5", "6"), class = "factor"), out = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 6L, 6L), .Label = c("3", "1", "2", "4",
"5", "6"), class = "factor")), row.names = c(1L, 3L, 4L, 5L,
6L, 8L, 9L, 12L, 13L, 14L), class = "data.frame")
我一直在尝试在此数据集上使用支持向量机算法,之前它运行良好,但由于某种原因现在出现错误。
我正在尝试的型号是
set.seed(111)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
svm_Linear <- train(out~`variableA` + `variableB` +
`variableC` +`variableD`+
`variableE` +`variableF1` +
`variableF2` + `variableF3` +
`variableF4` + `variableF5` +
`variableF6` + `variableF7` +
`variableF8` + `variableF9` +
`variableF10` + `variableG1` +
`variableG2` + `variableG3` , data= train, method = "svmLinear",
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
svm_Linear
但是我收到了这个我无法理解的错误。
Error: One or more factor levels in the outcome has no data: '2'
我在这个网站上看到了类似的 post,但是 none 有我需要的答案
您的 out
列是一个具有 6 个级别的因素,但您在 post 中提供的 dput
中只代表了 3 个级别 - 这就是您收到此错误的原因。
levels(train$out)
# "3" "1" "2" "4" "5" "6"
unique(train$out)
# 3 1 6
# Levels: 3 1 2 4 5 6
这可能是由于您执行 train/test 拆分的方式所致。
您可以重新定义 levels(out)
以仅包含 c(1, 3, 6)
,但如果您的测试数据包含其他响应级别,这将是一个问题。
考虑改用分层抽样方法,以确保您的响应变量在 train/test 拆分中正确表示。关于分层抽样的问题更适合 Cross Validated than for Stack Overflow, but there are some good starting points mentioned in this SO post and this one.