在 cross_val_score 中，参数 cv 的使用方式有何不同？

Question

我正在尝试弄清楚如何进行 k 折交叉验证。我希望有人能告诉我我的两个打印语句之间的区别。他们给我的数据大不相同，我认为它们是一样的。

##train is my training data, 
##target is my target, my binary class.

dtc = DecisionTreeClassifier()
kf = KFold(n_splits=10)
print(cross_val_score(dtc, train, target, cv=kf, scoring='accuracy'))
print(cross_val_score(dtc, train, target, cv=10, scoring='accuracy'))

Answer 1

DecisionTreeClassifier derives from ClassifierMixin，因此如文档中所述（强调我的）：

Computing cross-validated metrics

When the cv argument is an integer, cross_val_score uses the KFold or StratifiedKFold strategies by default, the latter being used if the estimator derives from ClassifierMixin.

所以在这里，当您通过 cv=10 时，您使用的是 StratifiedKFold 策略，而当您通过 cv=kf 时，您使用的是常规 KFold 策略。

在class化中，分层通常试图确保每个测试折叠具有大致相等的class 代表性。有关更多信息，请参阅关于交叉验证的 Understanding stratified cross-validation。

在 cross_val_score 中，参数 cv 的使用方式有何不同？

In cross_val_score, how is the parameter cv being used differently?

python

machine-learning

python-3.x

scikit-learn

cross-validation