没有交叉验证的 StepLDA
StepLDA without Cross Validation
我想在训练误差的基础上select变量。
出于这个原因,我将 trainControl 中的方法设置为 "none"。但是,如果我 运行 下面的函数两次,我会得到两个不同的错误(正确率)。
在这个示例中,差异不值一提。即便如此,我也不希望有任何不同。
有人知道这种差异是从哪里来的吗?
library(caret)
c_1 <- trainControl(method = "none")
maxvar <-(4)
direction <-"forward"
tune_1 <-data.frame(maxvar,direction)
train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
第一
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96667; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.97333; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 0.28
第二
> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.96667; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.0 0.0 0.3
您仍在进行 10 折交叉验证。只要不设置种子,多次训练模型时总会得到略有不同的答案。
如果您 运行 这段代码,包括 set.seed 您将获得相同的正确率。
set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)
根据评论编辑:
10 倍交叉验证正确率不是来自 Caret,而是来自 klaR 包中的 stepclass 函数。
stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf,
start.vars = NULL, direction = c("both", "forward", "backward"),
criterion = "CR", fold = 10, cv.groups = NULL, output = TRUE,
min1var = TRUE, ...)
fold parameter for cross-validation; omitted if ‘cv.groups’ is
specified.
如果需要,您可以通过将 fold 参数添加到训练函数来进行调整:
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)
但是1的折叠是没有意义的。你会收到一堆警告和错误。
我想在训练误差的基础上select变量。 出于这个原因,我将 trainControl 中的方法设置为 "none"。但是,如果我 运行 下面的函数两次,我会得到两个不同的错误(正确率)。 在这个示例中,差异不值一提。即便如此,我也不希望有任何不同。
有人知道这种差异是从哪里来的吗?
library(caret)
c_1 <- trainControl(method = "none")
maxvar <-(4)
direction <-"forward"
tune_1 <-data.frame(maxvar,direction)
train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
第一
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96667; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.97333; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 0.28
第二
> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.96667; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.0 0.0 0.3
您仍在进行 10 折交叉验证。只要不设置种子,多次训练模型时总会得到略有不同的答案。
如果您 运行 这段代码,包括 set.seed 您将获得相同的正确率。
set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)
根据评论编辑:
10 倍交叉验证正确率不是来自 Caret,而是来自 klaR 包中的 stepclass 函数。
stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, start.vars = NULL, direction = c("both", "forward", "backward"), criterion = "CR", fold = 10, cv.groups = NULL, output = TRUE, min1var = TRUE, ...)
fold parameter for cross-validation; omitted if ‘cv.groups’ is specified.
如果需要,您可以通过将 fold 参数添加到训练函数来进行调整:
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)
但是1的折叠是没有意义的。你会收到一堆警告和错误。