如何使用 KFold 而不是 StratifiedKFold 在 scikit-learn 中执行 RFECV?

How to do RFECV in scikit-learn with KFold, not StratifiedKFold?

from sklearn.cross_validation import StratifiedKFold, KFold
from sklearn.feature_selection import RFECV

rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=StratifiedKFold(y, 10),
scoring='accuracy') 
rfecv.fit(X, y)

是一个用StratifiedKFold做RFECV的例子。问题是如何用正常的KFold做RFECV?

cv=KFold(y, 10) 不是答案,因为 KFoldStratifiedKFold 需要和 returns 整个 different 值。

您可以手动创建自己的 CV 策略,模仿 KFold 所做的一切:

def createCV():
    '''returns somthing like:

    custom_cv = [([0, 1, 2 ,3, 4, 5, 6], [7]), 
          ([0, 1, 2, 3, 4, 5], [6]), 
          ([0, 1, 2, 3, 4], [5]),
          ([0, 1, 2, 3], [4]),
          ([0, 1, 2], [3])] 
    where the 0th list element in each tuple is the training set, and the second is the test 
    '''

manual_cv  = createCV()
rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=manual_cv,
scoring='accuracy') 

您甚至可以使用和重新安排 KFoldcreateCV 中为您提供的内容,以满足您的简历需求。

KFold(len(y), n_folds = n_folds) 就是答案。所以,对于 10 倍,它就像

rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=KFold(len(y),n_folds=10),
scoring='accuracy')