使用 sklearn.cross_validation.KFold() 在 Python 中使用 Kfold 交叉验证的 train_index、test_index 创建 K 数据框
Creating K dataframe using train_index, test_index of Kfold cross validation in Python using sklearn.cross_validation.KFold()
我在 python 中使用 5 折交叉验证,使用 sklearn.cross_validation.KFold() 来查看我的模型的性能。它在 4 次折叠上表现良好,而在特定一次折叠上表现非常差。由于我是数据科学的新手,我想知道如何从一个特定的折叠中检索数据,以便我可以查看该组中的数据并弄清楚如何修复它。
很简单。 K-Folds 的 Sklearn 文档中只有一个示例:
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) # create an array
y = np.array([1, 2, 3, 4]) # Create another array
kf = KFold(n_splits=2) # Define the split - into 2 folds
for train_index, test_index in kf.split(X):
print(“TRAIN:”, train_index, “TEST:”, test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
('TRAIN:', array([2, 3]), 'TEST:', array([0, 1]))
('TRAIN:', array([0, 1]), 'TEST:', array([2, 3]))
您还必须打印在每个步骤中计算的性能。
from pandas import ExcelWriter
from sklearn.model_selection import KFold
kf = KFold(n_splits=3)
fold = 0
writer = ExcelWriter('Kfoldcrossvalidation.xlsx')
for train_index, test_index in kf.split(X2):
fold += 1
print("Fold: %s" % fold)
X_train, X_test = X50.iloc[train_index], X50.iloc[test_index]
y_train, y_test = Y.iloc[train_index], Y.iloc[test_index]
print(y_test)
y_test.to_excel(writer,sheet_name='sheet ' + str(fold))
writer.save()
我在 python 中使用 5 折交叉验证,使用 sklearn.cross_validation.KFold() 来查看我的模型的性能。它在 4 次折叠上表现良好,而在特定一次折叠上表现非常差。由于我是数据科学的新手,我想知道如何从一个特定的折叠中检索数据,以便我可以查看该组中的数据并弄清楚如何修复它。
很简单。 K-Folds 的 Sklearn 文档中只有一个示例:
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) # create an array
y = np.array([1, 2, 3, 4]) # Create another array
kf = KFold(n_splits=2) # Define the split - into 2 folds
for train_index, test_index in kf.split(X):
print(“TRAIN:”, train_index, “TEST:”, test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
('TRAIN:', array([2, 3]), 'TEST:', array([0, 1]))
('TRAIN:', array([0, 1]), 'TEST:', array([2, 3]))
您还必须打印在每个步骤中计算的性能。
from pandas import ExcelWriter
from sklearn.model_selection import KFold
kf = KFold(n_splits=3)
fold = 0
writer = ExcelWriter('Kfoldcrossvalidation.xlsx')
for train_index, test_index in kf.split(X2):
fold += 1
print("Fold: %s" % fold)
X_train, X_test = X50.iloc[train_index], X50.iloc[test_index]
y_train, y_test = Y.iloc[train_index], Y.iloc[test_index]
print(y_test)
y_test.to_excel(writer,sheet_name='sheet ' + str(fold))
writer.save()