R插入符号朴素贝叶斯准确度为空
R caret naïve bayes accuracy is null
我有一个 dataset 可以使用 SVM 和朴素贝叶斯进行训练。
SVM 有效,但朴素贝叶斯无效。按照下面的源代码:
library(tools)
library(caret)
library(doMC)
library(mlbench)
library(magrittr)
library(caret)
CORES <- 5 #Optional
registerDoMC(CORES) #Optional
load("chat/rdas/2gram-entidades-erro.Rda")
set.seed(10)
split=0.60
maFinal$resposta <- as.factor(maFinal$resposta)
data_train <- as.data.frame(unclass(maFinal[ trainIndex,]))
data_test <- maFinal[-trainIndex,]
treegram25NotNull <- train(x = subset(data_train, select = -c(resposta)),
y = data_train$resposta,
method = "nb",
trControl = trainControl(method = "cv", number = 5, savePred=T, sampling = "up"))
treegram25NotNull
最终精度为空
Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
2: In train.default(subset(data_train, select = -c(resposta)), data_train$resposta, :
missing values found in aggregated results
非常感谢任何帮助,谢谢。
修复非常简单:
set.seed(10)
split <- 0.60
maFinal[] <- lapply(maFinal, as.factor)
目前你所有的变量,除了resposta
,都是数字。然而,它们最多只有 12~ 个不同的值,这意味着它们实际上都应该是因子变量。此外,其中许多是高度不平衡的。然后,在拆分样本时,问题出在将仅具有单个唯一值的(实际上是因子)变量视为连续变量。
我有一个 dataset 可以使用 SVM 和朴素贝叶斯进行训练。 SVM 有效,但朴素贝叶斯无效。按照下面的源代码:
library(tools)
library(caret)
library(doMC)
library(mlbench)
library(magrittr)
library(caret)
CORES <- 5 #Optional
registerDoMC(CORES) #Optional
load("chat/rdas/2gram-entidades-erro.Rda")
set.seed(10)
split=0.60
maFinal$resposta <- as.factor(maFinal$resposta)
data_train <- as.data.frame(unclass(maFinal[ trainIndex,]))
data_test <- maFinal[-trainIndex,]
treegram25NotNull <- train(x = subset(data_train, select = -c(resposta)),
y = data_train$resposta,
method = "nb",
trControl = trainControl(method = "cv", number = 5, savePred=T, sampling = "up"))
treegram25NotNull
最终精度为空
Warning messages: 1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. 2: In train.default(subset(data_train, select = -c(resposta)), data_train$resposta, : missing values found in aggregated results
非常感谢任何帮助,谢谢。
修复非常简单:
set.seed(10)
split <- 0.60
maFinal[] <- lapply(maFinal, as.factor)
目前你所有的变量,除了resposta
,都是数字。然而,它们最多只有 12~ 个不同的值,这意味着它们实际上都应该是因子变量。此外,其中许多是高度不平衡的。然后,在拆分样本时,问题出在将仅具有单个唯一值的(实际上是因子)变量视为连续变量。