Python Scikit 随机森林 pred_proba 输出四舍五入的值

Question

我在 scikit 学习中使用随机森林进行 class 化和获得 class 概率，我使用了 pred_proba 函数。但它输出四舍五入到小数点后一位的概率

我尝试使用示例鸢尾花数据集

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Categorical(iris.target, iris.target_names)
df.head()

train, test = df[df['is_train']==True], df[df['is_train']==False]

features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['species'])
clf.fit(train[features], y)
clf.predict_proba(train[features])

输出概率

   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  0.8,  0.2],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],

这是默认输出吗？是否可以增加小数位数？

注：找到了解决方案。默认编号树数=10，增加后没有。树到百的概率精度增加。

Answer 1

显然有十棵树的默认设置，您在代码中使用的是默认设置：

Parameters: 
n_estimators : integer, optional (default=10)
The number of trees in the forest.

尝试这样的事情，将树的数量增加到 25 或比 10 多的数量：

RandomForestClassifier(n_estimators=25, n_jobs=2)

如果您只是获得 10 个默认树的投票比例，这很可能会导致您看到的概率

您可能运行遇到问题，因为 iris 数据集非常小。如果我记得正确的话，少于 200 个观察结果。

predict.proba() 的文档如下：

The predicted class probabilities of an input sample is computed as the
mean predicted class probabilities of the trees in the forest. The class
probability of a single tree is the fraction of samples of the same 
class in a leaf.

没有任何参数可以调整我在文档中找到的预测概率的小数精度。

Python Scikit 随机森林 pred_proba 输出四舍五入的值

Python Scikit Random forest pred_proba outputs rounded off values

python

machine-learning

random-forest

scikit-learn