Caret 中的 PCA 阈值调整
PCA threshold tuning in Caret
我正在尝试使用插入符号从一些数据构建分类器。
我想尝试的方法之一是使用 PCA 预处理数据的简单 LDA。
我发现了如何为此使用插入符:
fitControl <- trainControl("repeatedcv", number=10, repeats = 10,
preProcOptions = list(thresh = 0.9))
ldaFit1 <- train(label ~ ., data = tab,
method = "lda2",
preProcess = c("center", "scale", "pca"),
trControl = fitControl)
正如预期的那样,插入符号正在比较具有不同维度值的 LDA 的准确性:
Linear Discriminant Analysis
158 samples
1955 predictors
3 classes: '1', '2', '3'
Pre-processing: centered (1955), scaled (1955), principal component
signal extraction (1955)
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 142, 142, 143, 142, 143, 142, ...
Resampling results across tuning parameters:
dimen Accuracy Kappa
1 0.5498987 0.1151681
2 0.5451340 0.1298590
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was dimen = 1.
我想做的是将 PCA 阈值添加到调整参数中,但是我找不到这样做的方法。
有插入符号的简单解决方案吗?或者是否需要使用不同的预处理选项重复训练步骤,最后 select 最好的值?
感谢误用指出的链接,我设法将 PCA 的方差解释阈值集成到参数调整中:
library(caret)
library(recipes)
library(MASS)
# Setting up a vector of thresholds to try out
pca_varex <- c(0.8, 0.9, 0.95, 0.97, 0.98, 0.99, 0.995, 0.999)
# Setting up recipe
initial_recipe <- recipe(train, formula = label ~ .) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors())
# Define the modelgrid
models <- model_grid() %>%
share_settings(data = train,
trControl = caret::trainControl(method = "repeatedcv",
number = 10,
repeats = 10),
method = "lda2")
# Add models with different PCA thresholds
for (i in pca_varex) {
models <- models %>% add_model(model_name = sprintf("varex_%s", i),
x = initial_recipe %>%
step_pca(all_predictors(), threshold = i))
}
# Train
models <- models %>% train(.)
虽然查找 modelgrid 和 recipes 文档,但 tidymodels 包可能是最直接的方法 (https://www.tidymodels.org/)。
我正在尝试使用插入符号从一些数据构建分类器。 我想尝试的方法之一是使用 PCA 预处理数据的简单 LDA。 我发现了如何为此使用插入符:
fitControl <- trainControl("repeatedcv", number=10, repeats = 10,
preProcOptions = list(thresh = 0.9))
ldaFit1 <- train(label ~ ., data = tab,
method = "lda2",
preProcess = c("center", "scale", "pca"),
trControl = fitControl)
正如预期的那样,插入符号正在比较具有不同维度值的 LDA 的准确性:
Linear Discriminant Analysis
158 samples
1955 predictors
3 classes: '1', '2', '3'
Pre-processing: centered (1955), scaled (1955), principal component
signal extraction (1955)
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 142, 142, 143, 142, 143, 142, ...
Resampling results across tuning parameters:
dimen Accuracy Kappa
1 0.5498987 0.1151681
2 0.5451340 0.1298590
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was dimen = 1.
我想做的是将 PCA 阈值添加到调整参数中,但是我找不到这样做的方法。
有插入符号的简单解决方案吗?或者是否需要使用不同的预处理选项重复训练步骤,最后 select 最好的值?
感谢误用指出的链接,我设法将 PCA 的方差解释阈值集成到参数调整中:
library(caret)
library(recipes)
library(MASS)
# Setting up a vector of thresholds to try out
pca_varex <- c(0.8, 0.9, 0.95, 0.97, 0.98, 0.99, 0.995, 0.999)
# Setting up recipe
initial_recipe <- recipe(train, formula = label ~ .) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors())
# Define the modelgrid
models <- model_grid() %>%
share_settings(data = train,
trControl = caret::trainControl(method = "repeatedcv",
number = 10,
repeats = 10),
method = "lda2")
# Add models with different PCA thresholds
for (i in pca_varex) {
models <- models %>% add_model(model_name = sprintf("varex_%s", i),
x = initial_recipe %>%
step_pca(all_predictors(), threshold = i))
}
# Train
models <- models %>% train(.)
虽然查找 modelgrid 和 recipes 文档,但 tidymodels 包可能是最直接的方法 (https://www.tidymodels.org/)。