当 n_samples % n_splits 非零时 KFold 如何工作

Question

给定样本数 n_samples 和 n_splits，当 n_sample % n_splits == 0 时，我们可以执行定义明确的 k 折交叉验证。

令人惊讶的是，当我不小心设置了n_samples = 40、n_splits = 14时，KFold仍然有效，这是我的代码

from sklearn.model_selection import KFold
import numpy as np

kf_test = KFold(n_splits=14)
test_x = np.random.rand(40)
pointer = 0
for item_t, item_v in kf_test.split(test_x):
    if pointer == 0:
        print(item_t.shape)
        print(item_v.shape)
        print(len(item_v) / 40)
    pointer += 1
pointer, test_x

KFold 在 n_samples % n_splits != 0 时如何工作？我尝试了不同的值，但无法找到某种模式。

Answer 1

文档说：

The first n_samples % n_splits folds have size n_samples // n_splits + 1, other folds have size n_samples // n_splits, where n_samples is the number of samples.

在这种情况下，数据集被分成 14 折，其中 12 折各有 3 个示例，2 折各有 2 个示例。

如果您只删除指针变量，您可以在代码中看到这一点。

from sklearn.model_selection import KFold
import numpy as np

kf_test = KFold(n_splits=14)
test_x = np.random.rand(40)
for item_t, item_v in kf_test.split(test_x):
    print(item_t.shape)
    print(item_v.shape)
    print(len(item_v) / 40)
# test_x

当 n_samples % n_splits 非零时 KFold 如何工作

How does KFold work when n_samples % n_splits is non-zero

python

scikit-learn

cross-validation