决策树产生不同的输出

Question

我目前正在使用决策树（使用 Scikit Learn）来预测某些值。我面临的问题是算法的输出不一致。这是决策树的属性吗？在多次运行中（data/algorithm 没有变化）我得到了不同的结果。

我使用了 Scikit 的决策树 Class 没有改变任何开始

svr = DecisionTreeRegressor()

然后，为了删除任何 'randomness'，我将其更改为

svr = DecisionTreeRegressor(splitter='best', random_state=None)

是什么导致了不同的结果？我该如何预防？

两个结果（为简单起见绘制）红色是 DTR 结果。蓝色是测试集。

Answer 1

The documentation 显示如下：

random_state ：整数，RandomState 实例或 None，可选（默认=None）如果是int，random_state是随机数生成器使用的种子如果是 RandomState 实例，random_state 是随机数生成器如果None，随机数生成器是np.random.

使用的RandomState实例

也许每次调用 DecisionTreeRegressor 时 numpy 都会生成一个新的 RandomState？

Answer 2

来自docs：

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

对于 None 它将使用 np.random 这也是随机的。要制作可重现的示例，您需要在 random_state 中指定 int 数字。例如：

svr = DecisionTreeRegressor(random_state=1)

对于你的情况，你正在做：

svr = DecisionTreeRegressor(splitter='best', random_state=None)

这与 random_state 的默认行为相同。

决策树产生不同的输出

Decision Tree produces different outputs

machine-learning

data-mining

decision-tree

scikit-learn