R 火车,svmRadial "Cannot scale data"
R train, svmRadial "Cannot scale data"
我正在使用 R 和这个 breastCancer
数据框。我想使用包 caret
中的函数 train
但由于以下错误,它不起作用。但是,当我使用另一个数据框时,该功能有效。
library(mlbench)
library(caret)
data("breastCancer")
BC = na.omit(breastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")
这是错误:
error : In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
您的代码包含一些拼写错误,例如包名称是 caret
而不是 caren
,数据集名称是 BreastCancer
而不是 breastCancer
。您可以使用以下代码来消除错误
library(mlbench)
library(caret)
data(BreastCancer)
BC = na.omit(BreastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")
它returns我
#> Support Vector Machines with Radial Basis Function Kernel
#>
#> 683 samples
#> 9 predictor
#> 2 classes: 'benign', 'malignant'
#>
#> No pre-processing
#> Resampling: Bootstrapped (25 reps)
#> Summary of sample sizes: 683, 683, 683, 683, 683, 683, ...
#> Resampling results across tuning parameters:
#>
#> C Accuracy Kappa
#> 0.25 0.9550137 0.9034390
#> 0.50 0.9585504 0.9107666
#> 1.00 0.9611485 0.9161541
#>
#> Tuning parameter 'sigma' was held constant at a value of 0.02349173
#> Accuracy was used to select the optimal model using the largest value.
#> The final values used for the model were sigma = 0.02349173 and C = 1.
我们可以从您拥有的数据开始:
library(mlbench)
library(caret)
data(BreastCancer)
BC = na.omit(BreastCancer[,-1])
str(BC)
'data.frame': 683 obs. of 10 variables:
$ Cl.thickness : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
$ Cell.size : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
$ Cell.shape : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
$ Marg.adhesion : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
$ Epith.c.size : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
$ Bare.nuclei : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
$ Bl.cromatin : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
$ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
$ Mitoses : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
$ Class : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...
BC
是 data.frame,您可以看到所有预测变量都是分类的或有序的。你正在尝试做一个 svmRadial,意思是 radial basis function 的 svm。计算分类特征之间的欧氏距离并不是那么简单,如果您查看类别的分布:
sapply(BC,table)
$Cl.thickness
1 2 3 4 5 6 7 8 9 10
139 50 104 79 128 33 23 44 14 69
$Cell.size
1 2 3 4 5 6 7 8 9 10
373 45 52 38 30 25 19 28 6 67
$Cell.shape
1 2 3 4 5 6 7 8 9 10
346 58 53 43 32 29 30 27 7 58
$Marg.adhesion
1 2 3 4 5 6 7 8 9 10
393 58 58 33 23 21 13 25 4 55
当你训练模型时,默认是bootstrap,你的一些训练数据会遗漏低代表的水平,例如从上面的table,类别9为Marg.adhesion
。并且这个变量在这次训练中变为全零,因此它会抛出错误。它很可能不会对整体结果产生太大影响(因为它们很少见)。
一种解决方案是使用交叉验证(您不太可能 select 测试折叠中的所有罕见观察结果)。请注意,当你有一个带有因子和字符的 data.frame 时,你不应该使用 as.matrix()
转换成矩阵。 Caret 可以这样处理 data.frame:
train(Class ~.,data=BC,method="svmRadial",trControl=trainControl(method="cv"))
Support Vector Machines with Radial Basis Function Kernel
683 samples
9 predictor
2 classes: 'benign', 'malignant'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 614, 615, 615, 615, 616, 615, ...
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.9575654 0.9101995
0.50 0.9619346 0.9190284
1.00 0.9633838 0.9220161
Tuning parameter 'sigma' was held constant at a value of 0.01841092
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01841092 and C = 1.
如果您想使用 bootstrap 进行交叉验证,另一种选择是忽略这些低 类 的观察结果,或者将它们与其他观察结果结合起来。
我正在使用 R 和这个 breastCancer
数据框。我想使用包 caret
中的函数 train
但由于以下错误,它不起作用。但是,当我使用另一个数据框时,该功能有效。
library(mlbench)
library(caret)
data("breastCancer")
BC = na.omit(breastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")
这是错误:
error : In .local(x, ...) : Variable(s) `' constant. Cannot scale data.
您的代码包含一些拼写错误,例如包名称是 caret
而不是 caren
,数据集名称是 BreastCancer
而不是 breastCancer
。您可以使用以下代码来消除错误
library(mlbench)
library(caret)
data(BreastCancer)
BC = na.omit(BreastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")
它returns我
#> Support Vector Machines with Radial Basis Function Kernel
#>
#> 683 samples
#> 9 predictor
#> 2 classes: 'benign', 'malignant'
#>
#> No pre-processing
#> Resampling: Bootstrapped (25 reps)
#> Summary of sample sizes: 683, 683, 683, 683, 683, 683, ...
#> Resampling results across tuning parameters:
#>
#> C Accuracy Kappa
#> 0.25 0.9550137 0.9034390
#> 0.50 0.9585504 0.9107666
#> 1.00 0.9611485 0.9161541
#>
#> Tuning parameter 'sigma' was held constant at a value of 0.02349173
#> Accuracy was used to select the optimal model using the largest value.
#> The final values used for the model were sigma = 0.02349173 and C = 1.
我们可以从您拥有的数据开始:
library(mlbench)
library(caret)
data(BreastCancer)
BC = na.omit(BreastCancer[,-1])
str(BC)
'data.frame': 683 obs. of 10 variables:
$ Cl.thickness : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
$ Cell.size : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
$ Cell.shape : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
$ Marg.adhesion : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
$ Epith.c.size : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
$ Bare.nuclei : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
$ Bl.cromatin : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
$ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
$ Mitoses : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
$ Class : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...
BC
是 data.frame,您可以看到所有预测变量都是分类的或有序的。你正在尝试做一个 svmRadial,意思是 radial basis function 的 svm。计算分类特征之间的欧氏距离并不是那么简单,如果您查看类别的分布:
sapply(BC,table)
$Cl.thickness
1 2 3 4 5 6 7 8 9 10
139 50 104 79 128 33 23 44 14 69
$Cell.size
1 2 3 4 5 6 7 8 9 10
373 45 52 38 30 25 19 28 6 67
$Cell.shape
1 2 3 4 5 6 7 8 9 10
346 58 53 43 32 29 30 27 7 58
$Marg.adhesion
1 2 3 4 5 6 7 8 9 10
393 58 58 33 23 21 13 25 4 55
当你训练模型时,默认是bootstrap,你的一些训练数据会遗漏低代表的水平,例如从上面的table,类别9为Marg.adhesion
。并且这个变量在这次训练中变为全零,因此它会抛出错误。它很可能不会对整体结果产生太大影响(因为它们很少见)。
一种解决方案是使用交叉验证(您不太可能 select 测试折叠中的所有罕见观察结果)。请注意,当你有一个带有因子和字符的 data.frame 时,你不应该使用 as.matrix()
转换成矩阵。 Caret 可以这样处理 data.frame:
train(Class ~.,data=BC,method="svmRadial",trControl=trainControl(method="cv"))
Support Vector Machines with Radial Basis Function Kernel
683 samples
9 predictor
2 classes: 'benign', 'malignant'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 614, 615, 615, 615, 616, 615, ...
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.9575654 0.9101995
0.50 0.9619346 0.9190284
1.00 0.9633838 0.9220161
Tuning parameter 'sigma' was held constant at a value of 0.01841092
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01841092 and C = 1.
如果您想使用 bootstrap 进行交叉验证,另一种选择是忽略这些低 类 的观察结果,或者将它们与其他观察结果结合起来。