LGBM 不随随机状态改变预测

LGBM not varying predictions with random state

我正在尝试计算分类器的预测区间。

我接受过 sklearn 培训。即使在我的管道中设置了一个新的 random_state 参数之后,它似乎并没有改变我在重新拟合数据时的结果。我该怎么办?

这是我正在使用的代码的相关片段:

SEED_VALUE = 3
t_clf = Pipeline(steps=[('preprocessor', preprocessor), ('lgbm',
                        LGBMClassifier(class_weight="balanced",
                        random_state=SEED_VALUE, max_depth=20,
                        min_child_samples=20, num_leaves=31))
                        ])
states = [0,1,2,3]

for state in states:   
    train_temp = train.copy()
    t_clf.set_params(lgbm__random_state=state)
    t_clf.fit(train_temp, train_temp['label'])
    t_clf.predict_proba(test)   

# output from predict probability doesn't change with varying states

尝试更改洗牌顺序或装袋种子时也会发生同样的情况。

如果这对了解有帮助的话,这是我当前的参数:

LGBMClassifier(bagging_seed=2, boosting_type='gbdt', class_weight='balanced',
               colsample_bytree=1.0, importance_type='split', learning_rate=0.1,
               max_depth=50, min_child_samples=1, min_child_weight=0.001,
               min_data_in_leaf=10, min_split_gain=0.0, n_estimators=100,
               n_jobs=-1, num_leaves=30, objective=None, random_state=1,
               reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0,
               subsample_for_bin=200000, subsample_freq=0)

无论随机种子如何,您都得到相同结果的原因是因为您的模型规范在任何阶段都没有执行随机抽样。例如,如果您将 colsample_bytree 设置为小于 1 的值,那么您将看到不同随机种子的不同预测概率。

from sklearn.datasets import make_classification
from lightgbm import LGBMClassifier

# generate some data
X, y = make_classification(n_samples=1000, n_features=50, random_state=100)

# set the random state
for state in [0, 1, 2, 3]:

    # instantiate the classifier
    clf = LGBMClassifier(
        class_weight='balanced',
        max_depth=20,
        min_child_samples=20,
        num_leaves=31,
        random_state=state,
        colsample_bytree=0.1,
    )

    # fit the classifier
    clf.fit(X, y)

    # predict the class probabilities
    y_pred = clf.predict_proba(X)

    # print the predicted probability of the 
    # first class for the first sample 
    print([state, format(y_pred[0, 0], '.4%')])

    # [0, '97.8132%']
    # [1, '97.4980%']
    # [2, '98.3729%']
    # [3, '98.0737%']