`R`包`caret`中`varImp`的损失函数是什么？

Question

我正在使用 R 包 caret 中的 varImp 函数来获取变量的重要性。这是我的代码：

library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
                       search = "grid",summaryFunction = youdenSumary)

classifier = train(form = Target ~ ., data = training_set, method = 'rpart',
                  parms = list(split = "information"),trControl=trctrl,
                  tuneLength = 10,metric = "j")

importance <- varImp(classifier, scale=FALSE)

这是结果变量重要性：

rpart variable importance

     Overall
nh   532.218
nRT  488.922
wdSu 482.582
av_t 390.266
nc   317.725
o    303.738
dt   291.488
wdMo 103.200
wdSa  49.690
ne    46.707
wdWe  41.642
nl    26.463
wdTu   9.506
wdTh   2.669

该代码运行递归分区算法并跟踪每次拆分减少了多少损失函数。但是……这种情况下的损失函数是多少？ Rdocumentation 表示：

The reduction in the loss function (e.g. mean squared error) attributed to each variable at each split is tabulated and the sum is returned. Also, since there may be candidate variables that are important but are not used in a split, the top competing variables are also tabulated at each split. This can be turned off using the maxcompete argument in rpart.control. This method does not currently provide class-specific measures of importance when the response is a factor.

它提到了均方误差。这是这个包中使用的损失函数吗（我不确定圆括号中的“例如”）？

Answer 1

均方误差用于回归。可以查一下the long intro for rpart，因为是做分类，所以有两个杂质函数，gini和信息熵：

您指定：

parms = list(split = "information")

这意味着您正在根据信息熵拆分您的树。在您的情况下，减少是指信息熵的减少。您可以通过以下方式检查插入符号使用的功能：

caret:::varImpDependencies("rpart")$varImp

它基本上总结了每次拆分信息熵的改进，您可以通过执行以下操作大致检查您的情况：

classifier$finalModel$splits

`R`包`caret`中`varImp`的损失函数是什么？

What is the loss function of `varImp` in `R` package `caret`?

r

rpart

r-caret