我可以通过带有 LinearRegressor 的钩子记录训练损失吗?

Can I log training loss via a hook with a LinearRegressor?

我是 TensorFlow 的新手。我正在使用 TF 1.8 进行 'simple' 线性回归。 练习的输出是最适合数据的一组线性权重,而不是预测模型。所以我想跟踪并记录训练期间的当前最小损失,以及相应的权重值。

我正在尝试使用 LinearRegressor:

tf.logging.set_verbosity(tf.logging.INFO)

model = tf.estimator.LinearRegressor(
    feature_columns = make_feature_cols(),
    model_dir = TRAINING_OUTDIR
)

# --------------------------------------------v
logger = tf.train.LoggingTensorHook({"loss": ???}, every_n_iter=10)
trainHooks = [logger]

model.train(
    input_fn = make_train_input_fn(df, num_epochs = nEpochs),
    hooks = trainHooks
)

该模型似乎不包含损失变量。

我能以某种方式使用 LoggingTensorHook 吗?在这种情况下,如何定义损失张量?

我也尝试过实现自己的钩子。示例建议通过调用 SessionRunArgsbefore_run 中记录损失,但我 运行 进入相同的问题。

谢谢!!

我同意@jdehesa 的观点,即 loss 不编写自定义 model_fn 就无法直接使用。但是,使用 LoggingTensorHook,您可以获得每一步的特征估计,并自行计算任何损失或其他训练指标。我建议使用 formatter 来处理挂钩可用的张量值。在下面的示例中,我使用 LoggingTensorHook 使用自定义 formatter 输出特征估计和当前 MSE 损失。

import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)

"""prepare inputs - generate sample data"""
num_features = 5
features = ['f'+str(i) for i in range (num_features)]
X = np.random.randint(-1000,1000, (10000, num_features))
a = np.random.randint (2, 30, size = (num_features))/10
b = np.random.randint(-99, 99)/10
y = np.matmul(X, a) + b
noise = np.random.randn(*X.shape)
X = X + (noise * 1)
X.shape, y.shape, a, b
>> ((10000, 5), (10000,), array([2.1, 2. , 1.7, 0.5, 0.9]), 1.8)

""" create model """
feature_cols = [tf.feature_column.numeric_column(k) for k in features]  
X_dict = {features[i]:X[:,i] for i in range (num_features) }

TRAINING_OUTDIR = '.'
model = tf.estimator.LinearRegressor(
    model_dir = TRAINING_OUTDIR,
    feature_columns = feature_cols)

input_fn = tf.estimator.inputs.numpy_input_fn(
    X_dict, y, batch_size=512, num_epochs=50, shuffle=True,
    queue_capacity=1000, num_threads=1)

input_fn_predict = tf.estimator.inputs.numpy_input_fn(
    X_dict, batch_size=X.shape[0], shuffle=False)

"""create hook and formatter"""
feature_var_names = [f"linear/linear_model/{f}/weights" for f in features]
hook_vars_list = ['global_step', 'linear/linear_model/bias_weights'] + feature_var_names

def hooks_formatter (tensor_dict):
    step = tensor_dict['global_step']
    a_hat = [tensor_dict[feat][0][0] for feat in feature_var_names]
    b_hat = tensor_dict['linear/linear_model/bias_weights'][0]
    y_pred = np.dot (X, np.array(a_hat).T) + b_hat
    mse_loss =  np.mean((y - y_pred)**2)   # MSE
    line = f"step:{step}; MSE_loss: {mse_loss:.4f}; bias:{b_hat:.3f};"
    for f,w in zip (features, a_hat):
        line += f" {f}:{w:.3f};"
    return line
hook1 = tf.train.LoggingTensorHook(hook_vars_list, every_n_iter=10, formatter=hooks_formatter)

"""train"""
model.train(input_fn = input_fn, steps=100,hooks = [hook1])
>>>
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into ./model.ckpt.
INFO:tensorflow:step:1; MSE_loss: 3183865.8670; bias:0.200; f0:0.200; f1:0.200; f2:0.200; f3:0.200; f4:0.200;
INFO:tensorflow:loss = 1924836100.0, step = 1
INFO:tensorflow:step:11; MSE_loss: 1023556.4537; bias:0.359; f0:0.936; f1:0.944; f2:0.903; f3:0.521; f4:0.802;
INFO:tensorflow:step:21; MSE_loss: 468665.2052; bias:0.269; f0:1.294; f1:1.276; f2:1.202; f3:0.437; f4:0.857;
INFO:tensorflow:step:31; MSE_loss: 232310.3535; bias:0.292; f0:1.513; f1:1.491; f2:1.379; f3:0.528; f4:0.893;
INFO:tensorflow:step:41; MSE_loss: 118843.3051; bias:0.278; f0:1.671; f1:1.633; f2:1.491; f3:0.472; f4:0.898;
INFO:tensorflow:step:51; MSE_loss: 62416.4437; bias:0.272; f0:1.782; f1:1.735; f2:1.563; f3:0.505; f4:0.903;
INFO:tensorflow:step:61; MSE_loss: 32799.2320; bias:0.277; f0:1.865; f1:1.808; f2:1.611; f3:0.487; f4:0.899;
INFO:tensorflow:step:71; MSE_loss: 17619.6118; bias:0.270; f0:1.924; f1:1.861; f2:1.641; f3:0.510; f4:0.904;
INFO:tensorflow:step:81; MSE_loss: 9423.0092; bias:0.283; f0:1.970; f1:1.899; f2:1.661; f3:0.494; f4:0.900;
INFO:tensorflow:step:91; MSE_loss: 5062.2780; bias:0.285; f0:2.003; f1:1.927; f2:1.675; f3:0.503; f4:0.901;
INFO:tensorflow:Saving checkpoints for 100 into ./model.ckpt.
INFO:tensorflow:Loss for final step: 1693422.1.
<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x7f90a590f240>