Keras:连接不同模型的两层以创建新模型

Keras: connecting two layers from different models to create new model

我要做什么:
我想连接来自不同模型的任何层以创建一个新的 keras 模型。

到目前为止我发现了什么:
https://github.com/keras-team/keras/issues/4205:使用模型的调用class改变另一个模型的输入。我对这种方法的问题:

https://github.com/keras-team/keras/issues/3465:使用基础模型的任何输出向基础模型添加新层。这里的问题:

  1. 虽然可以使用基础模型中的任何层,这意味着我可以从基础模型中切掉层,但我无法将编码器加载为 keras 模型。顶级模型总是必须创造新的。

我试过的:
我连接不同模型的任何层的方法:

  1. 清除输入层的入站节点
  2. 用输出层的tensor使用输出层的call()方法
  3. 通过用先前的输出张量切换新创建的张量来清理输出张量的出站节点

一开始我真的很乐观,因为 summary()plot_model() 得到了我想要的东西,所以节点图应该没问题吧?但是我运行在训练的时候出错了。虽然 "What I found so far" 部分中的方法训练得很好,但我 运行 我的方法出错了。这是错误消息:

  File "C:\Anaconda\envs\dlpipe\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 508, in apply_op
    (input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

可能是一条重要信息,我正在使用 Tensorflow 作为后端。我能够追溯这个错误的根源。计算梯度时似乎有错误。通常,每个节点都有一个梯度计算,但使用我的方法时,基础网络的所有节点都有 "None"。所以基本上在 keras/optimizers.py, get_updates() 计算梯度时 (grad = self.get_gradients(loss, params)).

这是代码(没有训练),实现了所有三种方法:

def create_base():
    in_layer = Input(shape=(32, 32, 3), name="base_input")
    x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_1")(in_layer)
    x = Conv2D(32, (3, 3), padding='same', activation="relu", name="base_conv2d_2")(x)
    x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling_2d_1")(x)
    x = Dropout(0.25, name="base_dropout")(x)

    x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_3")(x)
    x = Conv2D(64, (3, 3), padding='same', activation="relu", name="base_conv2d_4")(x)
    x = MaxPooling2D(pool_size=(2, 2), name="base_maxpooling2d_2")(x)
    x = Dropout(0.25, name="base_dropout_2")(x)

    return Model(inputs=in_layer, outputs=x, name="base_model")

def create_encoder():
    in_layer = Input(shape=(8, 8, 64))
    x = Flatten(name="encoder_flatten")(in_layer)
    x = Dense(512, activation="relu", name="encoder_dense_1")(x)
    x = Dropout(0.5, name="encoder_dropout_2")(x)
    x = Dense(10, activation="softmax", name="encoder_dense_2")(x)
    return Model(inputs=in_layer, outputs=x, name="encoder_model")

def extend_base(input_model):
    x = Flatten(name="custom_flatten")(input_model.output)
    x = Dense(512, activation="relu", name="custom_dense_1")(x)
    x = Dropout(0.5, name="custom_dropout_2")(x)
    x = Dense(10, activation="softmax", name="custom_dense_2")(x)
    return Model(inputs=input_model.input, outputs=x, name="custom_edit")

def connect_layers(from_tensor, to_layer, clear_inbound_nodes=True):
    try:
        tmp_output = to_layer.output
    except AttributeError:
        raise ValueError("Connecting to shared layers is not supported!")

    if clear_inbound_nodes:
        to_layer.inbound_nodes = []
    else:
        tensor_list = to_layer.inbound_nodes[0].input_tensors
        tensor_list.append(from_tensor)
        from_tensor = tensor_list
        to_layer.inbound_nodes = []
    new_output = to_layer(from_tensor)
    for out_node in to_layer.outbound_nodes:
        for i, in_tensor in enumerate(out_node.input_tensors):
            if in_tensor == tmp_output:
                out_node.input_tensors[i] = new_output


if __name__ == "__main__":
    base = create_base()
    encoder = create_encoder()

    #new_model_1 = Model(inputs=base.input, outputs=encoder(base.output))
    #plot_model(new_model_1, to_file="plots/new_model_1.png")

    new_model_2 = extend_base(base)
    plot_model(new_model_2, to_file="plots/new_model_2.png")
    print(new_model_2.summary())

    base_layer = base.get_layer("base_dropout_2")
    top_layer = encoder.get_layer("encoder_flatten")
    connect_layers(base_layer.output, top_layer)
    new_model_3 = Model(inputs=base.input, outputs=encoder.output)
    plot_model(new_model_3, to_file="plots/new_model_3.png")
    print(new_model_3.summary())

我知道这是很多文字和代码。但是我觉得有必要在这里解释一下这个问题。

编辑: 我刚刚尝试了 thenao,我认为该错误提供了更多信息:

theano.gradient.DisconnectedInputError:  
Backtrace when that variable is created:

似乎编码器模型的每一层都通过 TensorVariables 与编码器输入层有某种联系。

这就是我对 connect_layer() 函数的最终结果:

def connect_layers(from_tensor, to_layer, old_tensor=None):
    # if there is any shared layer after the to_layer, it is not supported
    try:
        tmp_output = to_layer.output
    except AttributeError:
        raise ValueError("Connecting to shared layers is not supported!")
    # check if to_layer has multiple input_tensors, and therefore some sort of merge layer
    if len(to_layer.inbound_nodes[0].input_tensors) > 1:
        tensor_list = to_layer.inbound_nodes[0].input_tensors
        found_tensor = False
        for i, tensor in enumerate(tensor_list):
            # exchange the old tensor with the new created tensor
            if tensor == old_tensor:
                tensor_list[i] = from_tensor
                found_tensor = True
                break
        if not found_tensor:
            tensor_list.append(from_tensor)
        from_tensor = tensor_list
        to_layer.inbound_nodes = []
    else:
        to_layer.inbound_nodes = []

    new_output = to_layer(from_tensor)

    tmp_out_nodes = to_layer.outbound_nodes[:]
    to_layer.outbound_nodes = []
    # recursively connect all layers after the current to_layer 
    for out_node in tmp_out_nodes:
        l = out_node.outbound_layer
        print("Connecting: " + str(to_layer) + " ----> " + str(l))
        connect_layers(new_output, l, tmp_output)

由于每个张量都通过 -> owner.inputs -> owner.inputs -> ... 拥有关于其根张量的所有信息,因此必须更新 new_output 张量之后的所有张量.
使用 theano 然后使用 tensorflow 后端进行调试要容易得多。

我仍然需要弄清楚如何处理共享层。在当前的实现中,无法在第一个 to_layer.

之后连接包含共享层的其他模型