如何使用 KFold 而不是 StratifiedKFold 在 scikit-learn 中执行 RFECV？

Question

from sklearn.cross_validation import StratifiedKFold, KFold
from sklearn.feature_selection import RFECV

rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=StratifiedKFold(y, 10),
scoring='accuracy') 
rfecv.fit(X, y)

是一个用StratifiedKFold做RFECV的例子。问题是如何用正常的KFold做RFECV？

cv=KFold(y, 10) 不是答案，因为 KFold 和 StratifiedKFold 需要和 returns 整个 different 值。

Answer 1

您可以手动创建自己的 CV 策略，模仿 KFold 所做的一切：

def createCV():
    '''returns somthing like:

    custom_cv = [([0, 1, 2 ,3, 4, 5, 6], [7]), 
          ([0, 1, 2, 3, 4, 5], [6]), 
          ([0, 1, 2, 3, 4], [5]),
          ([0, 1, 2, 3], [4]),
          ([0, 1, 2], [3])] 
    where the 0th list element in each tuple is the training set, and the second is the test 
    '''

manual_cv  = createCV()
rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=manual_cv,
scoring='accuracy')

您甚至可以使用和重新安排 KFold 在 createCV 中为您提供的内容，以满足您的简历需求。

Answer 2

KFold(len(y), n_folds = n_folds) 就是答案。所以，对于 10 倍，它就像

rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=KFold(len(y),n_folds=10),
scoring='accuracy')

如何使用 KFold 而不是 StratifiedKFold 在 scikit-learn 中执行 RFECV？

How to do RFECV in scikit-learn with KFold, not StratifiedKFold?

python

classification

machine-learning

scikit-learn