如何在 SMOTE(imblearn 模块)之后提取新添加的行
How can I extract the newly added rows after SMOTE (imblearn module)
是否可以从 imblearn 的 smote 函数创建的 pandas 数据框中提取新添加的行?
我想我明白了。显然它们被附加在 fit_resample returned 数据帧的末尾:
我的目标是"DIED"
smotez = SMOTENC([10,11], random_state=555, k_neighbors=10)
smote_tomek = SMOTETomek(random_state=555, smote=smotez , n_jobs=-1)
X_train_new, y_train_new = smote_tomek.fit_resample(X_train, y_train)
train_data_new = pd.concat([X_train_new.iloc[1:],y_train_new],axis=1)
train_data_new.dropna(inplace=True)
smote_data = train_data_new.iloc[len(train_data)-1:,]
print("Y_train_smote:\n", npunique(smote_data['DIED']),smote_data['DIED'].mean())
如您所见,所有行都是少数class ("DIED")
Y_train_smote:
[[1 91936]]1.0
仔细检查,下面的表达式应该return 0:
print(len(smote_data) + len(X_train) - len(X_train_new))
0
是否可以从 imblearn 的 smote 函数创建的 pandas 数据框中提取新添加的行?
我想我明白了。显然它们被附加在 fit_resample returned 数据帧的末尾:
我的目标是"DIED"
smotez = SMOTENC([10,11], random_state=555, k_neighbors=10)
smote_tomek = SMOTETomek(random_state=555, smote=smotez , n_jobs=-1)
X_train_new, y_train_new = smote_tomek.fit_resample(X_train, y_train)
train_data_new = pd.concat([X_train_new.iloc[1:],y_train_new],axis=1)
train_data_new.dropna(inplace=True)
smote_data = train_data_new.iloc[len(train_data)-1:,]
print("Y_train_smote:\n", npunique(smote_data['DIED']),smote_data['DIED'].mean())
如您所见,所有行都是少数class ("DIED")
Y_train_smote: [[1 91936]]1.0
仔细检查,下面的表达式应该return 0:
print(len(smote_data) + len(X_train) - len(X_train_new))
0