Scikit-learn 中的分层 GroupShuffleSplit
Stratified GroupShuffleSplit in Scikit-learn
我想问一下是否可以在 scikit-learn 中执行 "Stratified GroupShuffleSplit",换句话说,它是 GroupShuffleSplit and StratifiedShuffleSplit
的组合
这是我使用的代码示例:
cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\
train_size=train_size,random_state=random_state).split(\
allr_sets_nor[:,:2],allr_labels,groups=allr_groups)
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\
param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)
这里我应用了GroupShuffleSplit
但是我还是想根据allr_labels
添加启动
我通过在组上应用 StratifiedShuffleSplit 然后手动查找训练和测试集索引解决了这个问题,因为它们链接到组索引(在我的例子中,每个组包含 6 个连续的集合,从 6*index
到 6*index+5
)
如下所示:
sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size,
train_size=train_size,random_state=random_state).split(all_groups,all_labels)
# startified splitting for groups only
i=0
train_is = [np.array([],dtype=int)]*n_splits
test_is = [np.array([],dtype=int)]*n_splits
for train_index,test_index in sss :
# finding the corresponding indices of reflected training and testing sets
train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)])))
test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)])))
i=i+1
cv=[(train_is[i],test_is[i]) for i in range(n_splits)]
# constructing the final cross-validation iterable: list of 'n_splits' tuples;
# each tuple contains two numpy arrays for training and testing indices respectively
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid,
scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)
我想问一下是否可以在 scikit-learn 中执行 "Stratified GroupShuffleSplit",换句话说,它是 GroupShuffleSplit and StratifiedShuffleSplit
的组合这是我使用的代码示例:
cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\
train_size=train_size,random_state=random_state).split(\
allr_sets_nor[:,:2],allr_labels,groups=allr_groups)
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\
param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)
这里我应用了GroupShuffleSplit
但是我还是想根据allr_labels
我通过在组上应用 StratifiedShuffleSplit 然后手动查找训练和测试集索引解决了这个问题,因为它们链接到组索引(在我的例子中,每个组包含 6 个连续的集合,从 6*index
到 6*index+5
)
如下所示:
sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size,
train_size=train_size,random_state=random_state).split(all_groups,all_labels)
# startified splitting for groups only
i=0
train_is = [np.array([],dtype=int)]*n_splits
test_is = [np.array([],dtype=int)]*n_splits
for train_index,test_index in sss :
# finding the corresponding indices of reflected training and testing sets
train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)])))
test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)])))
i=i+1
cv=[(train_is[i],test_is[i]) for i in range(n_splits)]
# constructing the final cross-validation iterable: list of 'n_splits' tuples;
# each tuple contains two numpy arrays for training and testing indices respectively
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid,
scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_labels)