`R`包`caret`中`varImp`的损失函数是什么?
What is the loss function of `varImp` in `R` package `caret`?
我正在使用 R
包 caret
中的 varImp
函数来获取变量的重要性。这是我的代码:
library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
search = "grid",summaryFunction = youdenSumary)
classifier = train(form = Target ~ ., data = training_set, method = 'rpart',
parms = list(split = "information"),trControl=trctrl,
tuneLength = 10,metric = "j")
importance <- varImp(classifier, scale=FALSE)
这是结果变量重要性:
rpart variable importance
Overall
nh 532.218
nRT 488.922
wdSu 482.582
av_t 390.266
nc 317.725
o 303.738
dt 291.488
wdMo 103.200
wdSa 49.690
ne 46.707
wdWe 41.642
nl 26.463
wdTu 9.506
wdTh 2.669
该代码运行递归分区算法并跟踪每次拆分减少了多少损失函数。但是……这种情况下的损失函数是多少? Rdocumentation 表示:
The reduction in the loss function (e.g. mean squared error)
attributed to each variable at each split is tabulated and the sum is
returned. Also, since there may be candidate variables that are
important but are not used in a split, the top competing variables are
also tabulated at each split. This can be turned off using the
maxcompete argument in rpart.control. This method does not currently
provide class-specific measures of importance when the response is a
factor.
它提到了均方误差。这是这个包中使用的损失函数吗(我不确定圆括号中的“例如”)?
均方误差用于回归。可以查一下the long intro for rpart,因为是做分类,所以有两个杂质函数,gini和信息熵:
您指定:
parms = list(split = "information")
这意味着您正在根据信息熵拆分您的树。在您的情况下,减少是指信息熵的减少。您可以通过以下方式检查插入符号使用的功能:
caret:::varImpDependencies("rpart")$varImp
它基本上总结了每次拆分信息熵的改进,您可以通过执行以下操作大致检查您的情况:
classifier$finalModel$splits
我正在使用 R
包 caret
中的 varImp
函数来获取变量的重要性。这是我的代码:
library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
search = "grid",summaryFunction = youdenSumary)
classifier = train(form = Target ~ ., data = training_set, method = 'rpart',
parms = list(split = "information"),trControl=trctrl,
tuneLength = 10,metric = "j")
importance <- varImp(classifier, scale=FALSE)
这是结果变量重要性:
rpart variable importance
Overall
nh 532.218
nRT 488.922
wdSu 482.582
av_t 390.266
nc 317.725
o 303.738
dt 291.488
wdMo 103.200
wdSa 49.690
ne 46.707
wdWe 41.642
nl 26.463
wdTu 9.506
wdTh 2.669
该代码运行递归分区算法并跟踪每次拆分减少了多少损失函数。但是……这种情况下的损失函数是多少? Rdocumentation 表示:
The reduction in the loss function (e.g. mean squared error) attributed to each variable at each split is tabulated and the sum is returned. Also, since there may be candidate variables that are important but are not used in a split, the top competing variables are also tabulated at each split. This can be turned off using the maxcompete argument in rpart.control. This method does not currently provide class-specific measures of importance when the response is a factor.
它提到了均方误差。这是这个包中使用的损失函数吗(我不确定圆括号中的“例如”)?
均方误差用于回归。可以查一下the long intro for rpart,因为是做分类,所以有两个杂质函数,gini和信息熵:
您指定:
parms = list(split = "information")
这意味着您正在根据信息熵拆分您的树。在您的情况下,减少是指信息熵的减少。您可以通过以下方式检查插入符号使用的功能:
caret:::varImpDependencies("rpart")$varImp
它基本上总结了每次拆分信息熵的改进,您可以通过执行以下操作大致检查您的情况:
classifier$finalModel$splits