Tensorflow 评估频率

Tensorflow evaluation frequency

我在 tensorflow 中使用 train_and_evaluate 函数并希望更频繁地执行 eval 步骤(通过全局步骤或经过的时间)。这是我的代码(未显示模型函数)。

def get_classifier(batch_size):
    config = tf.estimator.RunConfig(
        model_dir="models/shape_model_cnn_3",
        save_checkpoints_secs=300,
        save_summary_steps=100)

    params = tf.contrib.training.HParams(
        batch_size=batch_size,
        num_conv=[48,64,96], # Sizes of each convolutional layer
        conv_len=[2,3,4], # Kernel size of each convolutional layer
        num_nodes=128, # Number of LSTM nodes for each LSTM layer
        num_layers=3, # Number of LSTM layers
        num_classes=7, # Number of classes in final layer
        learning_rate=0.0001,
        gradient_clipping_norm=9.0,
        dropout=0.3)

    classifier = tf.estimator.Estimator(
        model_fn=my_model,
        config=config,
        params=params
    )

    return classifier

classifier = get_classifier(8)

train_spec = tf.estimator.TrainSpec(
    input_fn=lambda:input.batch_dataset("dataset/shape-train-???.tfrecords", tf.estimator.ModeKeys.TRAIN, 8),
    max_steps=100000
)

eval_spec = tf.estimator.EvalSpec(
    input_fn=lambda:input.batch_dataset("dataset/shape-eval-???.tfrecords", tf.estimator.ModeKeys.EVAL, 8)
)

tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)

我已经尝试在我的 EvalSpec 中使用 start_delay_secs 参数,我不确定这是否是它的用途,但它似乎没有任何效果

您可以将 max_steps 设置为较低的数字,以便更快地进行评估。

这将重置输入功能。目前,无法使用估算器暂停输入功能并在相同状态下恢复。我们正在考虑添加此功能。

我发现 EvalSpec 中有一个参数“throttle_secs”,它会在数秒后开始评估阶段。或者,如果您想根据多个步骤进行评估,您可以使用 for 循环并按照@Kathy Wu 的建议逐步增加 max_steps。

改用tf.contrib.learn.Experiment

例如:

experiment = tf.contrib.learn.Experiment(

    estimator=estimator,  # Estimator

    train_input_fn=train_input_fn,  # First-class function

    eval_input_fn=eval_input_fn,  # First-class function

    train_steps=params.train_steps,  # Minibatch steps

    min_eval_frequency=params.min_eval_frequency,  # Eval frequency

    train_monitors=[train_input_hook],  # Hooks for training

    eval_hooks=[eval_input_hook],  # Hooks for evaluation

    eval_steps=None  # Use evaluation feeder until its empty

)

learn_runner.run(

    experiment_fn=experiment,  # First-class function

    run_config=run_config,  # RunConfig

    schedule="train_and_evaluate",  # What to run

    hparams=params  # HParams

)

当我设置save_checkpoints_steps时,它会在指定的步数后进行运行评估;配置:

tf.estimator.RunConfig(save_summary_steps=5, log_step_count_steps=3, save_checkpoints_steps=40)

40 steps给个评价。