如何将经过插入符号训练的随机森林模型输入到 predict() 和 performance() 函数中?
How to input a caret trained random forest model into predict() and performance() functions?
我想使用 performance()
创建精确召回曲线,但我不知道如何输入我的数据。我按照这个例子。
attach(ROCR.simple)
pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"prec","rec")
plot(perf)
我正在尝试针对我的 caret
训练过的 RF 模型 特别是 训练数据(我知道有有关如何在 newdata
上使用 predict
的各种示例)。我试过这个:
pred <- prediction(rf_train_model$pred$case, rf_train_model$pred$pred)
perf <- performance(pred,"prec","rec")
plot(perf)
下面是我的模型。我尝试了上面的方法,因为这似乎与 ROCR.simple
数据匹配。
#create model
ctrl <- trainControl(method = "cv",
number = 5,
savePredictions = TRUE,
summaryFunction = twoClassSummary,
classProbs = TRUE)
set.seed(3949)
rf_train_model <- train(outcome ~ ., data=df_train,
method= "rf",
ntree = 1500,
tuneGrid = data.frame(mtry = 33),
trControl = ctrl,
preProc=c("center","scale"),
metric="ROC",
importance=TRUE)
> head(rf_train_model$pred)
pred obs case control rowIndex mtry Resample
1 control control 0.3173333 0.6826667 4 33 Fold1
2 control control 0.3666667 0.6333333 7 33 Fold1
3 control control 0.2653333 0.7346667 16 33 Fold1
4 control control 0.1606667 0.8393333 18 33 Fold1
5 control control 0.2840000 0.7160000 20 33 Fold1
6 case case 0.6206667 0.3793333 25 33 Fold1
这是错误的,因为我的精确召回曲线走错了方向。我不仅对 PRAUC 曲线感兴趣,尽管这是一个 good source 展示了如何制作它,所以我想修复这个错误。我犯了什么错误?
如果您阅读了表演小插曲:
it has to be declared which class label denotes the negative, and
which the positive class. Ideally, labels should be supplied as
ordered factor(s), the lower level corresponding to the negative
class, the upper level to the positive class. If the labels are
factors (unordered), numeric, logical or characters, ordering of the
labels is inferred from R's built-in < relation (e.g. 0 < 1, -1 < 1,
'a' < 'b', FALSE < TRUE).
在你的情况下,当你提供rf_train_model$pred$pred时,上层仍然是"control",所以最好的办法是让它成为TRUE/FALSE。您还应该提供实际标签,而不是预测标签,rf_train_model$obs
。请参阅下面的示例:
library(caret)
library(ROCR)
set.seed(100)
df = data.frame(matrix(runif(100*100),ncol=100))
df$outcome = ifelse(runif(100)>0.5,"case","control")
df_train = df[1:80,]
df_test = df[81:100,]
rf_train_model <- train(outcome ~ ., data=df_train,
method= "rf",
ntree = 1500,
tuneGrid = data.frame(mtry = 33),
trControl = ctrl,
preProc=c("center","scale"),
metric="ROC",
importance=TRUE)
levels(rf_train_model$pred$pred)
[1] "case" "control"
plotCurve = function(label,positive_class,prob){
pred = prediction(prob,label==positive_class)
perf <- performance(pred,"prec","rec")
plot(perf)
}
plotCurve(rf_train_model$pred$obs,"case",rf_train_model$pred$case)
plotCurve(rf_test$outcome,"case",predict(rf_train,df_test,type="prob")[,2])
我想使用 performance()
创建精确召回曲线,但我不知道如何输入我的数据。我按照这个例子。
attach(ROCR.simple)
pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"prec","rec")
plot(perf)
我正在尝试针对我的 caret
训练过的 RF 模型 特别是 训练数据(我知道有有关如何在 newdata
上使用 predict
的各种示例)。我试过这个:
pred <- prediction(rf_train_model$pred$case, rf_train_model$pred$pred)
perf <- performance(pred,"prec","rec")
plot(perf)
下面是我的模型。我尝试了上面的方法,因为这似乎与 ROCR.simple
数据匹配。
#create model
ctrl <- trainControl(method = "cv",
number = 5,
savePredictions = TRUE,
summaryFunction = twoClassSummary,
classProbs = TRUE)
set.seed(3949)
rf_train_model <- train(outcome ~ ., data=df_train,
method= "rf",
ntree = 1500,
tuneGrid = data.frame(mtry = 33),
trControl = ctrl,
preProc=c("center","scale"),
metric="ROC",
importance=TRUE)
> head(rf_train_model$pred)
pred obs case control rowIndex mtry Resample
1 control control 0.3173333 0.6826667 4 33 Fold1
2 control control 0.3666667 0.6333333 7 33 Fold1
3 control control 0.2653333 0.7346667 16 33 Fold1
4 control control 0.1606667 0.8393333 18 33 Fold1
5 control control 0.2840000 0.7160000 20 33 Fold1
6 case case 0.6206667 0.3793333 25 33 Fold1
这是错误的,因为我的精确召回曲线走错了方向。我不仅对 PRAUC 曲线感兴趣,尽管这是一个 good source 展示了如何制作它,所以我想修复这个错误。我犯了什么错误?
如果您阅读了表演小插曲:
it has to be declared which class label denotes the negative, and which the positive class. Ideally, labels should be supplied as ordered factor(s), the lower level corresponding to the negative class, the upper level to the positive class. If the labels are factors (unordered), numeric, logical or characters, ordering of the labels is inferred from R's built-in < relation (e.g. 0 < 1, -1 < 1, 'a' < 'b', FALSE < TRUE).
在你的情况下,当你提供rf_train_model$pred$pred时,上层仍然是"control",所以最好的办法是让它成为TRUE/FALSE。您还应该提供实际标签,而不是预测标签,rf_train_model$obs
。请参阅下面的示例:
library(caret)
library(ROCR)
set.seed(100)
df = data.frame(matrix(runif(100*100),ncol=100))
df$outcome = ifelse(runif(100)>0.5,"case","control")
df_train = df[1:80,]
df_test = df[81:100,]
rf_train_model <- train(outcome ~ ., data=df_train,
method= "rf",
ntree = 1500,
tuneGrid = data.frame(mtry = 33),
trControl = ctrl,
preProc=c("center","scale"),
metric="ROC",
importance=TRUE)
levels(rf_train_model$pred$pred)
[1] "case" "control"
plotCurve = function(label,positive_class,prob){
pred = prediction(prob,label==positive_class)
perf <- performance(pred,"prec","rec")
plot(perf)
}
plotCurve(rf_train_model$pred$obs,"case",rf_train_model$pred$case)
plotCurve(rf_test$outcome,"case",predict(rf_train,df_test,type="prob")[,2])