Caret objecting to outcomes labels: Error: At least one of the class levels is not a valid R variable name

Caret objecting to outcomes labels: Error: At least one of the class levels is not a valid R variable name

caret 给我下面的错误。我正在训练 SVM 从一袋词开始进行预测,并想使用插入符来调整 C 参数,但是:

bow.model.svm.tune <- train(Training.match ~ ., data = data.frame(
    Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
    Text.features.dtm.df) %>%
        filter(Training.Data.old$Data.tipe == 'train'),
    method = 'svmRadial',
    tuneLength = 9,
    preProc = c("center","scale"),
    metric="ROC",
    trControl = trainControl(
        method="repeatedcv",
        repeats = 5,
        summaryFunction = twoClassSummary,
        classProbs = T))    

Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to no.match, match . Please use factor levels that can be used as valid R variable names (see ?make.names for help).

原来的e1071::svm()函数没有问题,所以我想错误出现在调整阶段:

bow.model.svm.tune <- svm(Training.match ~ ., data = data.frame(
             Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
             Text.features.dtm.df) %>%
                 filter(Training.Data.old$Data.tipe == 'train'))

数据只是一个结果因子变量和 TfIdf 转换词向量列表:

'data.frame':   1796 obs. of  1697 variables:
 $ Training.match          : Factor w/ 2 levels "no match","match": 2 1 1 1 1 1 1 1 2 1 ...
 $ azienda                 : num  0.12 0 0 0 0 ...
 $ bus                     : num  0.487 0 0 0 0 ...
 $ locale                  : num  0.275 0 0 0 0 ...
 $ martini                 : num  0.852 0.741 0.947 0.947 0.501 ...
 $ osp                     : num  0.339 0 0 0 0 ...
 $ ospedale                : num  0.0389 0.0676 0.0864 0.0864 0.0915 ...

预测时(内部使用 train 或自己使用 predict.train),函数会为每个 class 概率生成新列。如果您的代码需要一个名为 "no match" 的列,它将看不到 "no.match"data.frame 将其转换为)并且会抛出错误。