h2o 随机森林中 "importance" 的度量是什么

Question

这是我的代码：

set.seed(1)

#Boruta on the HouseVotes84 data from mlbench
library(mlbench) #has HouseVotes84 data
library(h2o)     #has rf

#spin up h2o
myh20 <- h2o.init(nthreads = -1)

#read in data, throw some away
data(HouseVotes84)
hvo <- na.omit(HouseVotes84)

#move from R to h2o
mydata <- as.h2o(x=hvo,
                 destination_frame= "mydata")

#RF columns (input vs. output)
idxy <- 1
idxx <- 2:ncol(hvo)

#split data
splits <- h2o.splitFrame(mydata,           
                         c(0.8,0.1))     

train <- h2o.assign(splits[[1]], key="train")   
valid <- h2o.assign(splits[[2]], key="valid") 

# make random forest
my_imp.rf<- h2o.randomForest(y=idxy,x=idxx,
                      training_frame = train,
                      validation_frame = valid,
                      model_id = "my_imp.rf",
                      ntrees=200)

# find importance
my_varimp <- h2o.varimp(my_imp.rf)
my_varimp

我得到的输出是 "variable importance"。

经典的措施是"mean decrease in accuracy"和"mean decrease in gini coefficient"。

我的结果是：

> my_varimp
Variable Importances: 
   variable relative_importance scaled_importance percentage
1        V4         3255.193604          1.000000   0.410574
2        V5         1131.646484          0.347643   0.142733
3        V3          921.106567          0.282965   0.116178
4       V12          759.443176          0.233302   0.095788
5       V14          492.264954          0.151224   0.062089
6        V8          342.811554          0.105312   0.043238
7       V11          205.392654          0.063097   0.025906
8        V9          191.110046          0.058709   0.024105
9        V7          169.117676          0.051953   0.021331
10      V15          135.097076          0.041502   0.017040
11      V13          114.906586          0.035299   0.014493
12       V2           51.939777          0.015956   0.006551
13      V10           46.716656          0.014351   0.005892
14       V6           44.336708          0.013620   0.005592
15      V16           34.779987          0.010684   0.004387
16       V1           32.528778          0.009993   0.004103

据我所知，"Vote #4" aka V4 的相对重要性是 ~3255.2。

问题： 那是什么单位？它是如何推导出来的？

我尝试查看文档，但没有找到答案。我尝试了帮助文档。我尝试使用 Flow 查看参数以查看其中是否有任何指示。在其中的 none 中，我找到了 "gini" 或 "decrease accuracy"。我应该看哪里？

Answer 1

答案在docs.

[ 在左窗格中，单击 "Algorithms"，然后单击 "Supervised"，然后单击 "DRF"。 FAQ 部分回答了这个问题。 ]

为方便起见，把答案也复制粘贴在这里：

"How is variable importance calculated for DRF? Variable importance is determined by calculating the relative influence of each variable: whether that variable was selected during splitting in the tree building process and how much the squared error (over all trees) improved as a result."

h2o 随机森林中 "importance" 的度量是什么

What is the measure used for "importance" in the h2o random Forest

random-forest

h2o

gini