为支持向量机绘制 ROC
Plotting ROC for support vector machine
我正在尝试按照其中一个示例 https://rpubs.com/JanpuHou/359286 为 svm 绘制 ROC,但我在最后一行代码中不断收到错误消息:这是我的数据集的头部
头(数据)
growth LogSales Age LogTA CoAge CoAge2 Reg DigMkt
1 No 15.87283 45 15.32751 8 64 0 1
2 Yes 16.05044 44 15.27176 7 49 0 1
3 Yes 15.36307 32 15.20180 3 9 1 0
4 Yes 15.09644 31 14.97866 2 4 1 0
5 Yes 16.90655 59 16.58810 11 121 1 0
6 Yes 16.45457 58 15.95558 10 100 1 0
My code:
split = sample.split(data, SplitRatio = 0.70)
training = subset(data, split==T)
testing = subset(data, split==F)
###Making growth last to allow for variable importnce
###Fitting model
svm_Lin = svm(growth~., data = training,
kernel = "linear", cost =1, scale = T,
probability = TRUE)
##Prediction
pred = predict(svm_Lin, testing)
table(predict = pred, truth = testing$growth)
confusionMatrix(table(pred, testing$growth))
###ROC Curve
library(ROCR)
p<- predict(svm_Lin,testing, type="decision")
pr<-prediction(p, testing$growth)
pref <- performance(pr, "tpr", "fpr")
plot(pref)
当我 运行 这一行时: pr<-prediction(p, testing$growth)
我收到以下错误消息
Error: Format of predictions is invalid. It couldn't be coerced to a list.
感谢任何有关如何解决此问题的帮助。
我建议采用下一种方法。您遇到的主要问题是来自 svm 的预测是类型因子,然后 ROCR
函数无法比较它们。我将对您的问题进行轻微修改。您有二进制数据,因此您可以将目标变量作为两个级别的因数。然后,在 ROCR
部分,您必须将因子转换为数值。这样你的代码就可以工作了。
此外,来自 caTools
包的采样方法正在生成 NA
。因此,我使用 rsample
包添加了类似的方法。代码在这里。
library(ROCR)
library(e1071)
library(rsample)
#Data
data <- structure(list(growth = c("Yes", "Yes", "Yes", "Yes", "Yes",
"Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes",
"Yes", "Yes", "Yes", "Yes", "No", "No"), LogSales = c(15.36307,
15.36307, 16.05044, 16.45457, 16.90655, 16.05044, 16.05044, 16.45457,
16.05044, 16.90655, 15.87283, 15.87283, 16.90655, 16.45457, 16.90655,
16.90655, 16.05044, 16.05044, 15.87283, 15.87283), Age = c(32L,
32L, 44L, 58L, 59L, 44L, 44L, 58L, 44L, 59L, 45L, 45L, 59L, 58L,
59L, 59L, 44L, 44L, 45L, 45L), LogTA = c(15.2018, 15.2018, 15.27176,
15.95558, 16.5881, 15.27176, 15.27176, 15.95558, 15.27176, 16.5881,
15.32751, 15.32751, 16.5881, 15.95558, 16.5881, 16.5881, 15.27176,
15.27176, 15.32751, 15.32751), CoAge = c(3L, 3L, 7L, 10L, 11L,
7L, 7L, 10L, 7L, 11L, 8L, 8L, 11L, 10L, 11L, 11L, 7L, 7L, 8L,
8L), CoAge2 = c(9L, 9L, 49L, 100L, 121L, 49L, 49L, 100L, 49L,
121L, 64L, 64L, 121L, 100L, 121L, 121L, 49L, 49L, 64L, 64L),
Reg = c(1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L,
1L, 1L, 1L, 0L, 0L, 0L, 0L), DigMkt = c(0L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L
)), row.names = c("3", "3.1", "2", "6", "5", "2.1", "2.2",
"6.1", "2.3", "5.1", "1", "1.1", "5.2", "6.2", "5.3", "5.4",
"2.4", "2.5", "1.2", "1.3"), class = "data.frame")
现在,我们格式化目标变量:
#Format objective var to have a factor
data$growth[data$growth=='No']<-0
data$growth[data$growth=='Yes']<-1
data$growth <- factor(data$growth,levels = c(0,1),labels = c(0,1))
拆分方法来自rsample
:
#Split
split <- initial_split(data, prop = 0.7,
strata = 'growth')
#Create training and test set
training <- training(split)
testing <- testing(split)
我们拟合模型:
###Fitting model
svm_Lin = svm(growth~., data = training,
kernel = "linear", cost =1, scale = T,
probability = TRUE,type="C-classification")
我们对测试集进行预测:
###Predict for ROC Curve
testing$p <- predict(svm_Lin,testing, type="response")
现在,我们格式化输出变量并为 ROCR
函数做准备:
作为从 1 开始的因子,数字 1 class 的值为 2,数字 0 class 的值为 1。您可以通过将其设为数字和减去 1.
#Format variables
testing$growth <- as.numeric(testing$growth)-1
testing$p <- as.numeric(testing$p)-1
最后,我们构建ROC曲线:
#Build ROCR scheme
pr<-prediction(testing$p, testing$growth)
pref <- performance(pr, "tpr", "fpr")
plot(pref)
输出:
我正在尝试按照其中一个示例 https://rpubs.com/JanpuHou/359286 为 svm 绘制 ROC,但我在最后一行代码中不断收到错误消息:这是我的数据集的头部 头(数据)
growth LogSales Age LogTA CoAge CoAge2 Reg DigMkt
1 No 15.87283 45 15.32751 8 64 0 1
2 Yes 16.05044 44 15.27176 7 49 0 1
3 Yes 15.36307 32 15.20180 3 9 1 0
4 Yes 15.09644 31 14.97866 2 4 1 0
5 Yes 16.90655 59 16.58810 11 121 1 0
6 Yes 16.45457 58 15.95558 10 100 1 0
My code:
split = sample.split(data, SplitRatio = 0.70)
training = subset(data, split==T)
testing = subset(data, split==F)
###Making growth last to allow for variable importnce
###Fitting model
svm_Lin = svm(growth~., data = training,
kernel = "linear", cost =1, scale = T,
probability = TRUE)
##Prediction
pred = predict(svm_Lin, testing)
table(predict = pred, truth = testing$growth)
confusionMatrix(table(pred, testing$growth))
###ROC Curve
library(ROCR)
p<- predict(svm_Lin,testing, type="decision")
pr<-prediction(p, testing$growth)
pref <- performance(pr, "tpr", "fpr")
plot(pref)
当我 运行 这一行时: pr<-prediction(p, testing$growth)
我收到以下错误消息
Error: Format of predictions is invalid. It couldn't be coerced to a list.
感谢任何有关如何解决此问题的帮助。
我建议采用下一种方法。您遇到的主要问题是来自 svm 的预测是类型因子,然后 ROCR
函数无法比较它们。我将对您的问题进行轻微修改。您有二进制数据,因此您可以将目标变量作为两个级别的因数。然后,在 ROCR
部分,您必须将因子转换为数值。这样你的代码就可以工作了。
此外,来自 caTools
包的采样方法正在生成 NA
。因此,我使用 rsample
包添加了类似的方法。代码在这里。
library(ROCR)
library(e1071)
library(rsample)
#Data
data <- structure(list(growth = c("Yes", "Yes", "Yes", "Yes", "Yes",
"Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes",
"Yes", "Yes", "Yes", "Yes", "No", "No"), LogSales = c(15.36307,
15.36307, 16.05044, 16.45457, 16.90655, 16.05044, 16.05044, 16.45457,
16.05044, 16.90655, 15.87283, 15.87283, 16.90655, 16.45457, 16.90655,
16.90655, 16.05044, 16.05044, 15.87283, 15.87283), Age = c(32L,
32L, 44L, 58L, 59L, 44L, 44L, 58L, 44L, 59L, 45L, 45L, 59L, 58L,
59L, 59L, 44L, 44L, 45L, 45L), LogTA = c(15.2018, 15.2018, 15.27176,
15.95558, 16.5881, 15.27176, 15.27176, 15.95558, 15.27176, 16.5881,
15.32751, 15.32751, 16.5881, 15.95558, 16.5881, 16.5881, 15.27176,
15.27176, 15.32751, 15.32751), CoAge = c(3L, 3L, 7L, 10L, 11L,
7L, 7L, 10L, 7L, 11L, 8L, 8L, 11L, 10L, 11L, 11L, 7L, 7L, 8L,
8L), CoAge2 = c(9L, 9L, 49L, 100L, 121L, 49L, 49L, 100L, 49L,
121L, 64L, 64L, 121L, 100L, 121L, 121L, 49L, 49L, 64L, 64L),
Reg = c(1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L,
1L, 1L, 1L, 0L, 0L, 0L, 0L), DigMkt = c(0L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L
)), row.names = c("3", "3.1", "2", "6", "5", "2.1", "2.2",
"6.1", "2.3", "5.1", "1", "1.1", "5.2", "6.2", "5.3", "5.4",
"2.4", "2.5", "1.2", "1.3"), class = "data.frame")
现在,我们格式化目标变量:
#Format objective var to have a factor
data$growth[data$growth=='No']<-0
data$growth[data$growth=='Yes']<-1
data$growth <- factor(data$growth,levels = c(0,1),labels = c(0,1))
拆分方法来自rsample
:
#Split
split <- initial_split(data, prop = 0.7,
strata = 'growth')
#Create training and test set
training <- training(split)
testing <- testing(split)
我们拟合模型:
###Fitting model
svm_Lin = svm(growth~., data = training,
kernel = "linear", cost =1, scale = T,
probability = TRUE,type="C-classification")
我们对测试集进行预测:
###Predict for ROC Curve
testing$p <- predict(svm_Lin,testing, type="response")
现在,我们格式化输出变量并为 ROCR
函数做准备:
作为从 1 开始的因子,数字 1 class 的值为 2,数字 0 class 的值为 1。您可以通过将其设为数字和减去 1.
#Format variables
testing$growth <- as.numeric(testing$growth)-1
testing$p <- as.numeric(testing$p)-1
最后,我们构建ROC曲线:
#Build ROCR scheme
pr<-prediction(testing$p, testing$growth)
pref <- performance(pr, "tpr", "fpr")
plot(pref)
输出: