XGBRegressor:更改 random_state 无效

XGBRegressor: change random_state no effect

尽管给出了新的随机种子,xgboost.XGBRegressor 似乎产生了相同的结果。

根据 xgboost 文档 xgboost.XGBRegressor

seed : int Random number seed. (Deprecated, please use random_state)

random_state : int Random number seed. (replaces seed)

random_state 是要使用的那个,但是,无论我使用什么 random_stateseed,模型都会产生相同的结果。错误?

from xgboost import XGBRegressor
from sklearn.datasets import load_boston
import numpy as np
from itertools import product

def xgb_train_predict(random_state=0, seed=None):
    X, y = load_boston(return_X_y=True)
    xgb = XGBRegressor(random_state=random_state, seed=seed)
    xgb.fit(X, y)
    y_ = xgb.predict(X)
    return y_

check = xgb_train_predict()

random_state = [1, 42, 58, 69, 72]
seed = [None, 2, 24, 85, 96]

for r, s in product(random_state, seed):
    y_ = xgb_train_predict(r, s)
    assert np.equal(y_, check).all()
    print('CHECK! \t random_state: {} \t seed: {}'.format(r, s))

[Out]:
    CHECK!   random_state: 1     seed: None
    CHECK!   random_state: 1     seed: 2
    CHECK!   random_state: 1     seed: 24
    CHECK!   random_state: 1     seed: 85
    CHECK!   random_state: 1     seed: 96
    CHECK!   random_state: 42    seed: None
    CHECK!   random_state: 42    seed: 2
    CHECK!   random_state: 42    seed: 24
    CHECK!   random_state: 42    seed: 85
    CHECK!   random_state: 42    seed: 96
    CHECK!   random_state: 58    seed: None
    CHECK!   random_state: 58    seed: 2
    CHECK!   random_state: 58    seed: 24
    CHECK!   random_state: 58    seed: 85
    CHECK!   random_state: 58    seed: 96
    CHECK!   random_state: 69    seed: None
    CHECK!   random_state: 69    seed: 2
    CHECK!   random_state: 69    seed: 24
    CHECK!   random_state: 69    seed: 85
    CHECK!   random_state: 69    seed: 96
    CHECK!   random_state: 72    seed: None
    CHECK!   random_state: 72    seed: 2
    CHECK!   random_state: 72    seed: 24
    CHECK!   random_state: 72    seed: 85
    CHECK!   random_state: 72    seed: 96

似乎(在开始挖掘答案之前我自己也不知道 :)),xgboost 仅将随机生成器用于子采样,请参阅 this Laurae's comment on a similar github issue。否则行为是确定性的。

如果您使用采样,xgboost 中当前 sklearn API 的 seed/random_state 处理存在问题。 seed 确实声称已弃用,但似乎如果有人提供它,它仍然会在 random_state 上使用,可见 here in the code。此评论仅在您拥有 seed not None

时才相关