如何在 kfold 交叉验证后绘制适合每个折叠的数据和模型?
How to plot the data and model fit for each fold after kfold cross validation?
我正在尝试根据一项特征预测一个标签变量。两者似乎呈高度线性相关。我选择了线性回归模型来描述数据。我的代码的输出显示训练和测试数据的 R2 分数。我的模型表现良好,期望测试样本的一倍,其中 R2 为负。
我想绘制每个折叠的数据和模型的拟合度,以了解出了什么问题。但是,我无法从 python 编码的角度弄清楚如何做到这一点。
有人可以帮忙吗?
Test_scores = list()
Train_scores =list()
n_splits = 5
kfold = KFold(n_splits=n_splits
, shuffle=False)
for train_ix, test_ix in kfold.split(Feature_X):
Train_Feature_X, Test_Feature_X=Feature_X[train_ix], Feature_X[test_ix]
Train_label_X, Test_label_X= label_X[train_ix],label_X[test_ix]
model = LinearRegression()
model.fit(Train_Feature_X, Train_label_X)
pred_label_train = model.predict(Train_Feature_X)
acc_train = r2_score(Train_label_X, pred_label_train)
pred_label_test = model.predict(Test_Feature_X)
acc_test = r2_score(Test_label_X, pred_label_test)
Test_scores.append(acc_test)
Train_scores.append(acc_train)
print('> ', 'Train:'+ str(acc_train), "Test:"+ str(acc_test))
Test_mean, Test_std = np.mean(Test_scores), np.std(Test_scores)
Train_mean, Train_std = np.mean(Train_scores), np.std(Train_scores)
print('Mean of test: %.3f, Standard Deviation: %.3f' % (Test_mean, Test_std))
print('Mean of train: %.3f, Standard Deviation: %.3f' % (Train_mean, Train_std))
代码输出:
> Train:0.9948113361306588 Test:0.9715872368615199
> Train:0.9905854864161807 Test:0.9917503220348162
> Train:0.9888929852977923 Test:-4.996610921978263
> Train:0.990942242777374 Test:0.9960355777732937
> Train:0.9925744355834707 Test:0.9458246438971184
Mean of test: -0.218, Standard Deviation: 2.389
Mean of train: 0.992, Standard Deviation: 0.002
您可以将绘图添加到循环周期中。
每次迭代您都可以访问训练测试折叠和预测,因此在打印值之前,即 print('> ', 'Train:'+ str(acc_train), "Test:"+ str(acc_test))
您可以执行以下操作:
fig, ax = plt.subplots(nrows=1, ncols=5)
curr_split = 1
for ...
plt.subplot(1, 5, curr_split)
plt.plot(x, y)
curr_split += 1
plt.show()
这将绘制 5 个子图,每个子图代表折叠。
请注意,这是您应该执行的操作的大纲,请参阅以下文档 link plt.subplots()
我正在尝试根据一项特征预测一个标签变量。两者似乎呈高度线性相关。我选择了线性回归模型来描述数据。我的代码的输出显示训练和测试数据的 R2 分数。我的模型表现良好,期望测试样本的一倍,其中 R2 为负。 我想绘制每个折叠的数据和模型的拟合度,以了解出了什么问题。但是,我无法从 python 编码的角度弄清楚如何做到这一点。
有人可以帮忙吗?
Test_scores = list()
Train_scores =list()
n_splits = 5
kfold = KFold(n_splits=n_splits
, shuffle=False)
for train_ix, test_ix in kfold.split(Feature_X):
Train_Feature_X, Test_Feature_X=Feature_X[train_ix], Feature_X[test_ix]
Train_label_X, Test_label_X= label_X[train_ix],label_X[test_ix]
model = LinearRegression()
model.fit(Train_Feature_X, Train_label_X)
pred_label_train = model.predict(Train_Feature_X)
acc_train = r2_score(Train_label_X, pred_label_train)
pred_label_test = model.predict(Test_Feature_X)
acc_test = r2_score(Test_label_X, pred_label_test)
Test_scores.append(acc_test)
Train_scores.append(acc_train)
print('> ', 'Train:'+ str(acc_train), "Test:"+ str(acc_test))
Test_mean, Test_std = np.mean(Test_scores), np.std(Test_scores)
Train_mean, Train_std = np.mean(Train_scores), np.std(Train_scores)
print('Mean of test: %.3f, Standard Deviation: %.3f' % (Test_mean, Test_std))
print('Mean of train: %.3f, Standard Deviation: %.3f' % (Train_mean, Train_std))
代码输出:
> Train:0.9948113361306588 Test:0.9715872368615199
> Train:0.9905854864161807 Test:0.9917503220348162
> Train:0.9888929852977923 Test:-4.996610921978263
> Train:0.990942242777374 Test:0.9960355777732937
> Train:0.9925744355834707 Test:0.9458246438971184
Mean of test: -0.218, Standard Deviation: 2.389
Mean of train: 0.992, Standard Deviation: 0.002
您可以将绘图添加到循环周期中。
每次迭代您都可以访问训练测试折叠和预测,因此在打印值之前,即 print('> ', 'Train:'+ str(acc_train), "Test:"+ str(acc_test))
您可以执行以下操作:
fig, ax = plt.subplots(nrows=1, ncols=5)
curr_split = 1
for ...
plt.subplot(1, 5, curr_split)
plt.plot(x, y)
curr_split += 1
plt.show()
这将绘制 5 个子图,每个子图代表折叠。
请注意,这是您应该执行的操作的大纲,请参阅以下文档 link plt.subplots()