如何在 n=4 的 Kfolds 数组中显示所有 4 个拆分?
How to display all 4 splits in a array for Kfolds at n=4?
此列表中的每个元组应包含一个 train_indices 列表和一个 test_indices 列表,其中包含特定第 K 个拆分的 training/testing 数据点索引。
以下是我们希望通过数据集实现的目标:
data_indices = [(list_of_train_indices_for_split_1, list_of_test_indices_for_split_1)
(list_of_train_indices_for_split_2, list_of_test_indices_for_split_2)
(list_of_train_indices_for_split_3, list_of_test_indices_for_split_3)
...
...
(list_of_train_indices_for_split_K, list_of_test_indices_for_split_K)]
这是我当前的函数:
def sklearn_kfold_split(data,K):
kf = KFold(n_splits = K, shuffle = False, random_state = None)
result = next(kf.split(data), None)
return [result]
这个函数的输出:
sklearn_kfold_split(data,4)
[(array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]))]
我不确定我应该添加或更改什么以获得下面的输出:
[(array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])),
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])),
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43])),
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43]),
array([44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57]))]
关于我可以更改我的功能的任何帮助或建议
解决此问题的最简单方法是使用列表理解来迭代 KFold.split
:
的结果
import pandas as pd
from sklearn.model_selection import KFold
def sklearn_kfold_split(data, K):
kf = KFold(n_splits=K, shuffle=False, random_state=None)
result = [(train_index, test_index) for train_index, test_index in kf.split(data)]
return result
data = list(range(12))
K = 4
sklearn_kfold_split(data_indices, K)
输出:
[(array([ 3, 4, 5, 6, 7, 8, 9, 10, 11]), array([0, 1, 2])),
(array([ 0, 1, 2, 6, 7, 8, 9, 10, 11]), array([3, 4, 5])),
(array([ 0, 1, 2, 3, 4, 5, 9, 10, 11]), array([6, 7, 8])),
(array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([ 9, 10, 11]))]
此列表中的每个元组应包含一个 train_indices 列表和一个 test_indices 列表,其中包含特定第 K 个拆分的 training/testing 数据点索引。
以下是我们希望通过数据集实现的目标:
data_indices = [(list_of_train_indices_for_split_1, list_of_test_indices_for_split_1)
(list_of_train_indices_for_split_2, list_of_test_indices_for_split_2)
(list_of_train_indices_for_split_3, list_of_test_indices_for_split_3)
...
...
(list_of_train_indices_for_split_K, list_of_test_indices_for_split_K)]
这是我当前的函数:
def sklearn_kfold_split(data,K):
kf = KFold(n_splits = K, shuffle = False, random_state = None)
result = next(kf.split(data), None)
return [result]
这个函数的输出:
sklearn_kfold_split(data,4)
[(array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]))]
我不确定我应该添加或更改什么以获得下面的输出:
[(array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])),
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])),
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57]),
array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43])),
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43]),
array([44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57]))]
关于我可以更改我的功能的任何帮助或建议
解决此问题的最简单方法是使用列表理解来迭代 KFold.split
:
import pandas as pd
from sklearn.model_selection import KFold
def sklearn_kfold_split(data, K):
kf = KFold(n_splits=K, shuffle=False, random_state=None)
result = [(train_index, test_index) for train_index, test_index in kf.split(data)]
return result
data = list(range(12))
K = 4
sklearn_kfold_split(data_indices, K)
输出:
[(array([ 3, 4, 5, 6, 7, 8, 9, 10, 11]), array([0, 1, 2])),
(array([ 0, 1, 2, 6, 7, 8, 9, 10, 11]), array([3, 4, 5])),
(array([ 0, 1, 2, 3, 4, 5, 9, 10, 11]), array([6, 7, 8])),
(array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([ 9, 10, 11]))]