更新 TensorFlow 自定义指标的内部状态（也就是在指标计算中使用 non-update_state 变量）

Question

版本：python 3.8.2（我也试过 3.6.8，但我认为 python 版本在这里不重要），tensorflow 2.3.0，numpy 1.18 .5

我正在使用稀疏标签张量为 class化问题训练模型。我将如何定义一个指标来计算“0”标签出现的次数直到那个时候？我在下面的代码示例中尝试做的是将指标在数组中看到的所有标签存储起来，并在每次调用 update_state 时不断地将现有数组与新的 y_true 连接起来。（我知道我可以只存储一个 count 变量并使用 +=，但在实际使用场景中，连接是理想的，内存不是问题。）这里是重现问题的最少代码：

import tensorflow as tf

class ZeroLabels(tf.keras.metrics.Metric):
    """Accumulates a list of all y_true sparse categorical labels (ints) and calculates the number of times the '0' label has appeared."""
    def __init__(self, *args, **kwargs):
        super(ZeroLabels, self).__init__(name="ZeroLabels")
        self.labels = self.add_weight(name="labels", shape=(), initializer="zeros", dtype=tf.int32)

    def update_state(self, y_true, y_pred, sample_weight=None):
        """I'm using sparse categorical crossentropy, so labels are 1D array of integers."""
        if self.labels.shape == (): # if this is the first time update_state is being called
            self.labels = y_true
        else:
            self.labels = tf.concat((self.labels, y_true), axis=0)

    def result(self):
        return tf.reduce_sum(tf.cast(self.labels == 0, dtype=tf.int32))

    def reset_states(self):
        self.labels = tf.constant(0, dtype=tf.int32)

此代码可独立运行，但当我尝试使用此指标训练模型时会抛出以下错误：

TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2

我认为这可能与调用 update_state 时 self.labels 不是图表的直接部分这一事实有关。以下是我尝试过的其他一些方法：

存储一个 tf.int32、shape=() count 变量并递增它而不是连接新标签
使用 .numpy() 将所有内容转换为 numpy 并将它们连接起来（我希望强制 TensorFlow 不使用图形）
使用 try 和 except 块以及上述 numpy 转换
创建一个全新的 class（而不是 subclassing tf.keras.metrics.Metric），它在可能的情况下专门使用 numpy，但这种方法会导致一些加载问题，即使我使用custom_objects 在 tf.keras.models.load_model
在所有方法上使用 @tf.autograph.experimental.do_not_convert 装饰器
修改全局变量而不是属性并使用 global 关键字
使用非tensorflow属性（不使用self.labels = self.add_weight...）

如果有帮助，这里是这个问题的更一般的版本：我们如何在 update_state 计算中合并未作为参数传递给 update_state 的张量？任何帮助将不胜感激。提前致谢！

Answer 1

主要问题是第一次迭代赋值，当时没有初始值：

if self.labels.shape == ():
    self.labels = y_true
else:
    self.labels = tf.concat((self.labels, y_true), axis=0)

在 if 块中，构造函数中定义的变量 'labels' 消失了，取而代之的是 tf.Tensor 对象 (y_true)。因此，您必须使用 tf.Variable 方法（assign，add_assing）来修改其内容但保留对象。此外，为了能够更改 tf.variable 形状，您必须以允许您拥有未定义形状的方式创建它，在本例中为：(None,1)，因为你在 axis=0.

上串联

所以：

class ZeroLabels(tf.keras.metrics.Metric):
    def __init__(self, *args, **kwargs):
        super(ZeroLabels, self).__init__(name="ZeroLabels")

        # Define a variable with unknown shape. This will allow you have dynamically sized variables (validate_shape=False)
        self.labels = tf.Variable([], shape=(None,), validate_shape=False)

    def update_state(self, y_true, y_pred, sample_weight=None):
        # On update method, just assign as new value the prevoius one joined with y_true
        self.labels.assign(tf.concat([self.labels.value(), y_true[:,0]], axis=0))

    def result(self):
        return tf.reduce_sum(tf.cast(self.labels.value() == 0, dtype=tf.int32))

    def reset_states(self):
        # To reset the metric, assign again an empty tensor
        self.labels.assign([])

但是，如果你只计算数据集的 0，我建议你有一个整数变量来计算这些元素，因为在每批处理之后，标签数组会增加它的大小并得到总和它的所有元素将花费越来越多的时间，从而减慢您的训练速度。

class ZeroLabels_2(tf.keras.metrics.Metric):
    """Accumulates a list of all y_true sparse categorical labels (ints) and calculates the number of times the '0' label has appeared."""
    def __init__(self, *args, **kwargs):
        super(ZeroLabels_2, self).__init__(name="ZeroLabels")

        # Define an integer variable
        self.labels = tf.Variable(0, dtype=tf.int32)

    def update_state(self, y_true, y_pred, sample_weight=None):
        # Increase variable with every batch
        self.labels.assign_add(tf.cast(tf.reduce_sum(tf.cast(y_true == 0, dtype=tf.int32)), dtype=tf.int32 ))

    def result(self):
        # Simply return variable's content
        return self.labels.value()

    def reset_states(self):
        self.labels.assign(0)

我希望这可以帮助你（并且对英语水平表示歉意）

更新 TensorFlow 自定义指标的内部状态（也就是在指标计算中使用 non-update_state 变量）

Updating internal state for TensorFlow custom metrics (aka using non-update_state vars in metric calculation)

python

metrics

graph

tensorflow