线性 SVM 和提取权重
Linear SVM and extracting the weights
我正在使用 iris 数据集在 R 中练习 SVM,我想从我的模型中获取特征 weights/coefficients,但我想我可能误解了一些东西,因为我的输出给了我 32 个支持向量。假设我要分析四个变量,我会得到四个。我知道在使用 svm()
函数时有一种方法可以做到这一点,但我正在尝试使用插入符号中的 train()
函数来生成我的 SVM。
library(caret)
# Define fitControl
fitControl <- trainControl(## 5-fold CV
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary )
# Define Tune
grid<-expand.grid(C=c(2^-5,2^-3,2^-1))
##########
df<-iris head(df)
df<-df[df$Species!='setosa',]
df$Species<-as.character(df$Species)
df$Species<-as.factor(df$Species)
# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
y=df$Species,
method = "svmLinear",
trControl = fitControl,
preProc = c("center","scale"),
metric="ROC",
tuneGrid=grid )
svmFit1
我认为这只是 svmFit1$finalModel@coef
但是当我认为我应该得到 4 个时我得到了 32 个向量。这是为什么?
所以coef
不是支持向量的权重W
。这是 docs 中 ksvm
class 的相关部分:
coef
The corresponding coefficients times the training labels.
要获得所需内容,您需要执行以下操作:
coefs <- svmFit1$finalModel@coef[[1]]
mat <- svmFit1$finalModel@xmatrix[[1]]
coefs %*% mat
请参阅下面的可重现示例。
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 3.5.2
# Define fitControl
fitControl <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
# Define Tune
grid <- expand.grid(C = c(2^-5, 2^-3, 2^-1))
##########
df <- iris
df<-df[df$Species != 'setosa', ]
df$Species <- as.character(df$Species)
df$Species <- as.factor(df$Species)
# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
y=df$Species,
method = "svmLinear",
trControl = fitControl,
preProc = c("center","scale"),
metric="ROC",
tuneGrid=grid )
coefs <- svmFit1$finalModel@coef[[1]]
mat <- svmFit1$finalModel@xmatrix[[1]]
coefs %*% mat
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] -0.1338791 -0.2726322 0.9497457 1.027411
由 reprex package (v0.2.1.9000)
创建于 2019-06-11
来源
随着越来越多的人开始从 Caret 转向 Tidymodels,我想我会在 2020 年 8 月为 Tidymodels 提供上述解决方案的一个版本,因为到目前为止我没有看到很多关于这个的讨论,而且它并不是那么简单做。
此处概述了主要步骤,但请查看末尾的链接以详细了解为什么这样做。
1.获取最终模型
set.seed(2020)
# Assuming kernlab linear SVM
# Grid Search Parameters
tune_rs <- tune_grid(
model_wf,
train_folds,
grid = param_grid,
metrics = classification_measure,
control = control_grid(save_pred = TRUE)
)
# Finalise workflow with the parameters for best accuracy
best_accuracy <- select_best(tune_rs, "accuracy")
svm_wf_final <- finalize_workflow(
model_wf,
best_accuracy
)
# Fit on your final model on all available data at the end of experiment
final_model <- fit(svm_wf_final, data)
# fit takes a model spec and executes the model fit routine (Parsnip)
# model_spec, formula and data to fit upon
2。提取 KSVM 对象,提取所需信息,计算变量重要性
ksvm_obj <- pull_workflow_fit(final_model)$fit
# Pull_workflow_fit returns the parsnip model fit object
# $fit returns the object produced by the fitting fn (which is what we need! and is dependent on the engine)
coefs <- ksvm_obj@coef[[1]]
# first bit of info we need are the coefficients from the linear fit
mat <- ksvm_obj@xmatrix[[1]]
# xmatrix that we need to matrix multiply against
var_impt <- coefs %*% mat
# var importance
参考:
使用插入符号提取支持向量的权重:
变量重要性(post 的最后一部分):http://www.rebeccabarter.com/blog/2020-03-25_machine_learning/#finalize-the-workflow
我正在使用 iris 数据集在 R 中练习 SVM,我想从我的模型中获取特征 weights/coefficients,但我想我可能误解了一些东西,因为我的输出给了我 32 个支持向量。假设我要分析四个变量,我会得到四个。我知道在使用 svm()
函数时有一种方法可以做到这一点,但我正在尝试使用插入符号中的 train()
函数来生成我的 SVM。
library(caret)
# Define fitControl
fitControl <- trainControl(## 5-fold CV
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary )
# Define Tune
grid<-expand.grid(C=c(2^-5,2^-3,2^-1))
##########
df<-iris head(df)
df<-df[df$Species!='setosa',]
df$Species<-as.character(df$Species)
df$Species<-as.factor(df$Species)
# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
y=df$Species,
method = "svmLinear",
trControl = fitControl,
preProc = c("center","scale"),
metric="ROC",
tuneGrid=grid )
svmFit1
我认为这只是 svmFit1$finalModel@coef
但是当我认为我应该得到 4 个时我得到了 32 个向量。这是为什么?
所以coef
不是支持向量的权重W
。这是 docs 中 ksvm
class 的相关部分:
coef
The corresponding coefficients times the training labels.
要获得所需内容,您需要执行以下操作:
coefs <- svmFit1$finalModel@coef[[1]]
mat <- svmFit1$finalModel@xmatrix[[1]]
coefs %*% mat
请参阅下面的可重现示例。
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 3.5.2
# Define fitControl
fitControl <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
# Define Tune
grid <- expand.grid(C = c(2^-5, 2^-3, 2^-1))
##########
df <- iris
df<-df[df$Species != 'setosa', ]
df$Species <- as.character(df$Species)
df$Species <- as.factor(df$Species)
# set random seed and run the model
set.seed(321)
svmFit1 <- train(x = df[-5],
y=df$Species,
method = "svmLinear",
trControl = fitControl,
preProc = c("center","scale"),
metric="ROC",
tuneGrid=grid )
coefs <- svmFit1$finalModel@coef[[1]]
mat <- svmFit1$finalModel@xmatrix[[1]]
coefs %*% mat
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> [1,] -0.1338791 -0.2726322 0.9497457 1.027411
由 reprex package (v0.2.1.9000)
创建于 2019-06-11来源
随着越来越多的人开始从 Caret 转向 Tidymodels,我想我会在 2020 年 8 月为 Tidymodels 提供上述解决方案的一个版本,因为到目前为止我没有看到很多关于这个的讨论,而且它并不是那么简单做。
此处概述了主要步骤,但请查看末尾的链接以详细了解为什么这样做。
1.获取最终模型
set.seed(2020)
# Assuming kernlab linear SVM
# Grid Search Parameters
tune_rs <- tune_grid(
model_wf,
train_folds,
grid = param_grid,
metrics = classification_measure,
control = control_grid(save_pred = TRUE)
)
# Finalise workflow with the parameters for best accuracy
best_accuracy <- select_best(tune_rs, "accuracy")
svm_wf_final <- finalize_workflow(
model_wf,
best_accuracy
)
# Fit on your final model on all available data at the end of experiment
final_model <- fit(svm_wf_final, data)
# fit takes a model spec and executes the model fit routine (Parsnip)
# model_spec, formula and data to fit upon
2。提取 KSVM 对象,提取所需信息,计算变量重要性
ksvm_obj <- pull_workflow_fit(final_model)$fit
# Pull_workflow_fit returns the parsnip model fit object
# $fit returns the object produced by the fitting fn (which is what we need! and is dependent on the engine)
coefs <- ksvm_obj@coef[[1]]
# first bit of info we need are the coefficients from the linear fit
mat <- ksvm_obj@xmatrix[[1]]
# xmatrix that we need to matrix multiply against
var_impt <- coefs %*% mat
# var importance
参考:
使用插入符号提取支持向量的权重:
变量重要性(post 的最后一部分):http://www.rebeccabarter.com/blog/2020-03-25_machine_learning/#finalize-the-workflow