为什么 Tensorflow 模型的分支输出仅从 1 个分支产生？

Question

我是 Tensorflow 初学者，我正在尝试使用 Python 解释 here 通过检索模型重现 TF 分类，因为博客提供了 C++ 代码。

模型架构似乎已成功重现，如 model architecture 所示。我使用 for 循环和 tf.nn.embedding_lookup() 为每个要聚合的 class 创建“分支”（tf.reduce_max）并连接到最后一个输出层。问题是输出 总是产生 1 class 只有 .

这是我的代码，

input = Input([None, None, 3], dtype=tf.uint8)
preprocess_layer = tf.cast(input, tf.float32)
preprocess_layer = tf.keras.applications.mobilenet.preprocess_input(preprocess_layer)

x = MobNetSmall(preprocess_layer)
x = Flatten()(x)

x = Lambda(lambda x: tf.nn.l2_normalize(x), name='l2_norm_layer')(x)
retrieval_output = Dense(
        num_instances,
        kernel_initializer=weights_matrix,
        activation="linear",
        trainable=False,
        name='retrieval_layer')(x)

labels = [fn.split('-')[0]+'-'+fn.split('-')[1] for fn in filenames]
class_id = set(labels)
selection_layer_output = list()

for ci in class_id:
    class_index = [i for i, x in enumerate(labels) if x == ci]
    class_index = tf.cast(class_index, tf.int32)
    x = Lambda(lambda x: tf.nn.embedding_lookup(x[0], class_index), name=f'{ci}_selection_layer')(retrieval_output)
    x = Lambda(lambda x: tf.reduce_max(x), name=f'{ci}_aggregate_max')(x)
    selection_layer_output.append(x)

concatenated_ouput = tf.stack(selection_layer_output, axis=0)

model = Model(inputs=preprocess_layer, outputs=concatenated_ouput)
model.summary()

这是我尝试预测测试图像时的输出，

root = tk.Tk()
root.update()
filename = askopenfilename(filetypes=[("images", ["*.jpg", "*.jpeg", "*.png"])])
img = cv2.imread(filename)
root.destroy()

query_imgarr = preprocess_img(img)
model_output = model.predict(query_imgarr)
model_output

>>> array([0.92890763, 0.92890763, 0.92890763, 0.92890763, 0.92890763],
      dtype=float32)

当我尝试分别进行嵌入查找和聚合时，输出是正确的。如下所示，该模型仅产生第 4 个（从上方）class。

labels = [fn.split('-')[0]+'-'+fn.split('-')[1] for fn in filenames]
class_id = set(labels)

for ci in class_id:
    class_index = [i for i, x in enumerate(labels) if x == ci]
    class_predictions = tf.nn.embedding_lookup(model_output[0], class_index)
    output_ = tf.reduce_max(class_predictions)
    print(output_)

>>> tf.Tensor(0.49454707, shape=(), dtype=float32)
>>> tf.Tensor(0.6946863, shape=(), dtype=float32)
>>> tf.Tensor(0.62603784, shape=(), dtype=float32)
>>> tf.Tensor(0.92890763, shape=(), dtype=float32)
>>> tf.Tensor(0.59326285, shape=(), dtype=float32)

任何帮助将不胜感激，谢谢！

Answer 1

所以环顾四周后，参考这个 thread，使用 TF 操作的“正确”方法（在我的例子中是 tf.nn.embedding_lookup 和 tf.reduce_max）是将它们包装在Layer 子类，或通过制作自定义图层。

class AggregationLayer(tf.keras.layers.Layer):
    def __init__(self, class_index):
        self.class_index = class_index
        super(AggregationLayer, self).__init__()
    
    def call(self, inputs, **kwargs):
        x = tf.nn.embedding_lookup(inputs[0], self.class_index)
        x = tf.reduce_max(x)
        return x

这个解决方案解决了我的问题。

为什么 Tensorflow 模型的分支输出仅从 1 个分支产生？

Why is the branched output of Tensorflow model yielding only from 1 branch?

python

data-retrieval

tensorflow

embedding-lookup