In R's randomForest package, do factors have to be explicitly labeled as factors?

或者包会意识到它们不是连续的并将它们视为因素?我知道,对于分类,被分类的特征确实需要成为一个因素。但是预测功能呢?我在几个玩具数据集上 运行 它,根据分类特征是数字还是因子,我得到的结果略有不同,但算法是随机的,所以我不知道我的结果是否不同有意义。



对于分类数据(this 实际上是 CrossValidated 上一个很好的答案):

A split on a factor with N levels is actually a selection of one of the (2^N)−2 possible combinations. So, the algorithm will check all the possible combinations and choose the one that produces the better split


Numerical predictors are sorted then for every value Gini impurity or entropy is calculated and a threshold is chosen which gives the best split.
