无法使用 DNNRegressor 进行增量训练

Question

我尝试根据 google 的课程编写一个学习案例，该课程使用 DNNRegressor 设置神经网络 (intro_to_neural_nets)。但是我在执行脚本时出现错误：

...
File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 662, in iterations
    raise RuntimeError("Cannot set `iterations` to a new Variable after "
RuntimeError: Cannot set `iterations` to a new Variable after the Optimizer weights have been created

在我的代码中，我按照示例将步骤分成多个周期来执行，代码如下：

def training(learning_rate, steps, batch_size, hidden_units, samples, targets, test_samples, test_targets, periods = 10):
  steps_per_period = steps / periods

  #create DNNRegressor Object
  my_optimizer = tf.optimizers.SGD(learning_rate=learning_rate, momentum=0.9, clipnorm=5.0)
  dnn_regressor = tf.estimator.DNNRegressor(
    feature_columns = construct_feature_columns(samples),
    hidden_units = hidden_units,
    optimizer = my_optimizer
  )

  # Create input functions.
  training_input_fn = lambda: input_fn(samples, 
                                          targets, 
                                          batch_size=batch_size)
  predict_training_input_fn = lambda: input_fn(samples, 
                                                  targets, 
                                                  num_epochs=1, 
                                                  shuffle=False)
  predict_validation_input_fn = lambda: input_fn(test_samples, 
                                                    test_targets, 
                                                    num_epochs=1, 
                                                    shuffle=False)
  # Train the model, but do so inside a loop so that we can periodically assess
  # loss metrics.
  print("Training model...")
  print("RMSE (on training data):")
  training_rmse = []
  validation_rmse = []
  for period in range (0, periods):
    # Train the model, starting from the prior state.
    print("Period[%s]" % (period+1))
    dnn_regressor.train(
        input_fn=training_input_fn,
        steps=steps_per_period
    )
...

第一期执行成功，第二次执行失败，报上层错误跳出。

我再次添加立即训练动作以测试这些是否是导致此问题的任何其他步骤，但它告诉我问题就在这里（再次调用训练步骤）

#changed code
    print("Period[%s]" % (period+1))
    dnn_regressor.train(
        input_fn=training_input_fn,
        steps=steps_per_period
    )
    print("--- again")
    dnn_regressor.train(
        input_fn=training_input_fn
    )

有输出

Training model...
RMSE (on training data):
Period[1]
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:550: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/ops/clip_ops.py:172: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/model_fn.py:337: scalar (from tensorflow.python.framework.tensor_shape) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.TensorShape([]).
2019-09-26 10:27:41.728179: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-26 10:27:41.742511: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe4f6546af0 executing computations on platform Host. Devices:
2019-09-26 10:27:41.742564: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
--- again
Traceback (most recent call last):
  File "/~/Documents/workspace/tensorflow/intro_to_neural_nets.py", line 174, in <module>
    test_targets=test_Y)
  File "/~/Documents/workspace/tensorflow/intro_to_neural_nets.py", line 123, in training
    input_fn=training_input_fn
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py", line 1166, in _model_fn
    batch_norm=batch_norm)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py", line 580, in dnn_model_fn_v2
    optimizer.iterations = training_util.get_or_create_global_step()
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 561, in __setattr__
    super(OptimizerV2, self).__setattr__(name, value)
  File "/~/.tf-env/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 662, in iterations
    raise RuntimeError("Cannot set `iterations` to a new Variable after "
RuntimeError: Cannot set `iterations` to a new Variable after the Optimizer weights have been created

我不知道为什么会出现这个错误，也不知道如何解决。感谢任何人的帮助。顺便说一下，如果有人能告诉我如何 avoid/eliminate 这些警告，也非常感谢。

Answer 1

我不认为你做错了什么。我使用固定估算器编辑了 TensorFlow 文档中的示例，但无法使用任何 tf.keras.optimizer() 进行多次 estimator.train(...) 调用的训练。您的代码可能运行没有指定优化器，但我不确定在这种情况下使用什么学习率或优化器...

我刚刚在 TF github 上将其作为问题打开。请参阅此处以获取更新： https://github.com/tensorflow/tensorflow/issues/33358

如果您想立即开始，可以将您的代码降级为 TF 1.x，这样您就可以大致匹配 google 机器学习速成课程的版本。

如果您有更大的野心，TF 团队建议您开始使用 Keras 学习 TensorFlow。来自关于预制估算器的文档页面：

Note that in TensorFlow 2.0, the Keras API can accomplish many of these same tasks, and is believed to be an easier API to learn. If you are starting fresh, we would recommend you start with Keras. For more information about the available high level APIs in TensorFlow 2.0, see Standardizing on Keras.

编辑：监控培训的一个选项是使用 tensorboard，这种方法不费力。您的代码更改为：

删除循环。
添加model_dir参数以查找日志。

dnn_regressor = tf.estimator.DNNRegressor(
    feature_columns = construct_feature_columns(samples),
    hidden_units = hidden_units,
    optimizer = my_optimizer,
    model_dir = /tmp/log_dir
  )

打开 TensorBoard（可能不需要 reload_multifile 选项）：

%load_ext tensorboard
%tensorboard --logdir '/tmp/log_dir' --reload_multifile=true

TensorBoard 默认每 30 秒更新一次，但如果您想更密切地监控训练，可以更新得更快。如果您想更详细地探索模型的外观，这个工具也非常酷！

编辑 2：github 上向我建议了一个简单的解决方法。这通过创建可调用而不是 optimizer 的实例传递给 Estimator 来实现。使用可调用对象，每当调用 Estimator.train() 时都会创建一个新实例，因此避免了尝试在现有 Optimizer 上设置 iterations 的问题。

from functools import partial

my_optimizer = partial(SGD, learning_rate=leraning_rate, momentum=0.9, clipnorm=5.0)

Answer 2

我试过了：

from functools import partial

my_optimizer = partial(SGD, learning_rate=leraning_rate, momentum=0.9, clipnorm=5.0)

它对我有用：

from functools import partial

my_optimizer = partial(optimizers.SGD, learning_rate=learning_rate, momentum=0.9, clipnorm=5.0)

无法使用 DNNRegressor 进行增量训练

Cannot do incremental training with DNNRegressor

tensorflow2.0