使用 folds.split(train.values, target.values) 时如何在交叉验证中使用 tqdm
How to use tqdm in the cross validtion when using folds.split(train.values, target.values)
我正在对我的 lightgbm 模型进行如下交叉验证。
我想在 for 循环中使用 tqdm
以便我可以检查过程。
folds = KFold(n_splits=num_folds, random_state=2319)
oof = np.zeros(len(train))
getVal = np.zeros(len(train))
predictions = np.zeros(len(target))
feature_importance_df = pd.DataFrame()
print('Light GBM Model')
for fold_, (trn_idx, val_idx) in enumerate(folds.split(train.values, target.values)):
X_train, y_train = train.iloc[trn_idx][features], target.iloc[trn_idx]
X_valid, y_valid = train.iloc[val_idx][features], target.iloc[val_idx]
print("Fold idx:{}".format(fold_ + 1))
trn_data = lgb.Dataset(X_train, label=y_train, categorical_feature=categorical_features)
val_data = lgb.Dataset(X_valid, label=y_valid, categorical_feature=categorical_features)
clf = lgb.train(param, trn_data, 1000000, valid_sets = [trn_data, val_data], verbose_eval=5000, early_stopping_rounds = 4000)
oof[val_idx] = clf.predict(train.iloc[val_idx][features], num_iteration=clf.best_iteration)
getVal[val_idx]+= clf.predict(train.iloc[val_idx][features], num_iteration=clf.best_iteration) / folds.n_splits
fold_importance_df = pd.DataFrame()
fold_importance_df["feature"] = features
fold_importance_df["importance"] = clf.feature_importance()
fold_importance_df["fold"] = fold_ + 1
feature_importance_df = pd.concat([feature_importance_df, fold_importance_df], axis=0)
predictions += clf.predict(test[features], num_iteration=clf.best_iteration) / folds.n_splits
print("CV score: {:<8.5f}".format(roc_auc_score(target, oof)))
我试过使用tqdm(enumerate(folds.split(train.values, target.values))
或enumerate(tqdm(folds.split(train.values, target.values)))
,但没用。
我猜想它们不起作用的原因是因为 enumerate
没有长度。
但是我想知道在这种情况下如何使用 tqdm。
谁能帮帮我?
提前致谢。
在k-fold次迭代中做一个进度条(desc参数是可选的):
from tqdm import tqdm
for train, test in tqdm(kfold.split(x, y), total=kfold.get_n_splits(), desc="k-fold"):
# Your code here
输出将是这样的:
k-fold: 100%|██████████| 10/10 [02:26<00:00, 16.44s/it]
我正在对我的 lightgbm 模型进行如下交叉验证。
我想在 for 循环中使用 tqdm
以便我可以检查过程。
folds = KFold(n_splits=num_folds, random_state=2319)
oof = np.zeros(len(train))
getVal = np.zeros(len(train))
predictions = np.zeros(len(target))
feature_importance_df = pd.DataFrame()
print('Light GBM Model')
for fold_, (trn_idx, val_idx) in enumerate(folds.split(train.values, target.values)):
X_train, y_train = train.iloc[trn_idx][features], target.iloc[trn_idx]
X_valid, y_valid = train.iloc[val_idx][features], target.iloc[val_idx]
print("Fold idx:{}".format(fold_ + 1))
trn_data = lgb.Dataset(X_train, label=y_train, categorical_feature=categorical_features)
val_data = lgb.Dataset(X_valid, label=y_valid, categorical_feature=categorical_features)
clf = lgb.train(param, trn_data, 1000000, valid_sets = [trn_data, val_data], verbose_eval=5000, early_stopping_rounds = 4000)
oof[val_idx] = clf.predict(train.iloc[val_idx][features], num_iteration=clf.best_iteration)
getVal[val_idx]+= clf.predict(train.iloc[val_idx][features], num_iteration=clf.best_iteration) / folds.n_splits
fold_importance_df = pd.DataFrame()
fold_importance_df["feature"] = features
fold_importance_df["importance"] = clf.feature_importance()
fold_importance_df["fold"] = fold_ + 1
feature_importance_df = pd.concat([feature_importance_df, fold_importance_df], axis=0)
predictions += clf.predict(test[features], num_iteration=clf.best_iteration) / folds.n_splits
print("CV score: {:<8.5f}".format(roc_auc_score(target, oof)))
我试过使用tqdm(enumerate(folds.split(train.values, target.values))
或enumerate(tqdm(folds.split(train.values, target.values)))
,但没用。
我猜想它们不起作用的原因是因为 enumerate
没有长度。
但是我想知道在这种情况下如何使用 tqdm。
谁能帮帮我?
提前致谢。
在k-fold次迭代中做一个进度条(desc参数是可选的):
from tqdm import tqdm
for train, test in tqdm(kfold.split(x, y), total=kfold.get_n_splits(), desc="k-fold"):
# Your code here
输出将是这样的:
k-fold: 100%|██████████| 10/10 [02:26<00:00, 16.44s/it]