GridsearchCV 损失不等于 model.fit() 损失值

Question

我对 GridsearchCV 在其参数搜索中使用的指标感到困惑。我的理解是我的模型对象为它提供了一个度量，这就是用于确定“best_params”的度量。但事实并非如此。我认为 score=None 是默认值，因此使用了 model.compile() 的指标选项中给出的第一个指标。所以在我的例子中，使用的评分函数应该是 mean_squred_error。下面是我对这个问题的解释。

这就是我正在做的。我使用 sklearn 模拟了一些回归数据，其中包含 100,000 个观测值的 10 个特征。我正在玩 keras，因为我过去通常使用 pytorch，直到现在才真正涉足 keras。我注意到我的 GridsearchCV 调用的损失函数输出与 model.fit() 调用在我拥有最佳参数集后的差异。现在我知道我可以 refit=True 而不是再次重新拟合模型，但我正在尝试感受 keras 和 sklearn GridsearchCV 函数的输出。

要明确说明这里的差异是我所看到的。我使用sklearn模拟了一些数据如下：

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

我创建了一个“create_model”函数，用于调整我正在使用的激活函数（同样，这是概念验证的简单示例）。

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

执行网格搜索我得到以下输出

model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best: -21.163454 using {'activation_fn': 'linear'}

好的，所以最好的指标是 21.16 的均方误差（我知道他们翻转符号来创建最大化问题）。因此，当我使用 activation_fn = 'linear' 拟合模型时，我得到的 MSE 完全不同。

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
.....
.....
Epoch 49/50
8000/8000 [==============================] - 0s 48us/step - loss: 344.1636 - mean_squared_error: 344.1636 - mean_absolute_error: 12.2109
Epoch 50/50
8000/8000 [==============================] - 0s 48us/step - loss: 326.4524 - mean_squared_error: 326.4524 - mean_absolute_error: 11.9250
history.history['mean_squared_error']
Out[723]: 
[10053.778002929688,
 9826.66806640625,
  ......
  ......
 344.16363830566405,
 326.45237121582034]

区别在于 326.45 与 21.16。任何关于我误解的见解将不胜感激。如果它们在彼此合理的邻域内，我会更舒服，因为一个是来自一个折叠与整个训练数据集的误差。但是 21 远不及 326。谢谢！

完整代码见此处。

import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing 
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)

history.history.keys()
plt.plot(history.history['mean_absolute_error'])

# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Answer 1

输出 (326.45237121582034) 中报告的大损失是 训练损失。如果您需要将指标与 grid_result.best_score_（在 GridSearchCV 中）和 MSE（在 best_model.fit 中）进行比较，则必须请求 validation loss（参见下面的代码）。

现在问题来了：为什么验证损失比训练损失低？在您的情况下，这主要是因为辍学（在训练期间应用但在 validation/test 期间不适用）——这就是为什么当您删除辍学时训练损失和验证损失之间的差异消失的原因。您可以找到有关验证损失较低的可能原因的详细解释 here。

简而言之，模型的性能 (MSE) 由 grid_result.best_score_（在您的示例中为 21.163454）给出。

import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.datasets import make_regression
import tensorflow as tf

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
tf.random.set_seed(42)

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1, validation_data=(X_test, y_test))

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1, validation_data=(X_test, y_test))

history.history.keys()
# plt.plot(history.history['mae'])

# summarize results
print(grid_result.cv_results_)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

GridsearchCV 损失不等于 model.fit() 损失值

GridsearchCV loss doesn't equal model.fit() loss values

keras

gridsearchcv