添加无参数层后无法加载优化器权重

Can't load optimizer weights after adding layer without parameters

模型 A:

ipt = Input(batch_shape=(32, 240, 4))
x1  = Conv1D(16, 20,  strides=200, padding='same')(ipt)
x1  = BatchNormalization()(x1)
x2  = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2  = BatchNormalization()(x2) # ...

模型 B:

ipt = Input(batch_shape=(32, 250, 4))
x1  = Conv1D(16, 20,  strides=200)(ipt)
x1  = BatchNormalization()(x1)
x2  = Conv1D(16, 200, strides=120)(ipt)
x2  = BatchNormalization()(x2) # ...


两者具有相同的权重形状 - 但是,A 的优化器 weights 不能 加载到 B 上,因为 B 具有不同的构建顺序(下面的图像和代码)。

这是一个更大模型的一小段,它需要每 X 个时期更改其 timesteps 参数,并且 ZeroPadding1D 似乎 更改层构建顺序 每当它被使用时;这不会影响 model 权重,因为它们是通过字典映射的——而优化器权重是按顺序映射的,列表到列表。

可在 TF1 和 TF2 中重现,并带有 kerastf.keras 导入。有什么问题,如何解决? Relevant Git


环境:Win-10 OS、CUDA 10.0.130、cuDNN 7.6.0、Python 3.7.4、GTX 1070

观察结果:


model_A.summary():

Layer (type)                    Output Shape         Param #     Connected to     
==================================================================================
input_1 (InputLayer)            [(32, 240, 4)]       0                            
__________________________________________________________________________________
conv1d (Conv1D)                 (32, 2, 16)          1296        input_1[0][0]    
__________________________________________________________________________________
conv1d_1 (Conv1D)               (32, 2, 16)          12816       input_1[0][0]    
__________________________________________________________________________________
bn_1 (BatchNormalization)       (32, 2, 16)          64          conv1d[0][0]     
__________________________________________________________________________________
bn_2 (BatchNormalization)       (32, 2, 16)          64          conv1d_1[0][0]   
__________________________________________________________________________________
concatenate (Concatenate)       (32, 2, 32)          0           bn_1[0][0]       
                                                                 bn_2[0][0]       
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D)  (32, 32)             0           concatenate[0][0]
__________________________________________________________________________________
dense (Dense)                   (32, 1)              33          gap_0[0][0]      

model_B.summary() (注意交换层)

input_2 (InputLayer)            [(32, 250, 4)]       0                               
_____________________________________________________________________________________
conv1d_2 (Conv1D)               (32, 2, 16)          1296        input_2[0][0]       
_____________________________________________________________________________________
bn_1 (BatchNormalization)       (32, 2, 16)          64          conv1d_2[0][0]      
_____________________________________________________________________________________
conv1d_3 (Conv1D)               (32, 3, 16)          12816       input_2[0][0]       
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D)  (32, 3, 16)          0           bn_1[0][0]          
_____________________________________________________________________________________
bn_2 (BatchNormalization)       (32, 3, 16)          64          conv1d_3[0][0]      
_____________________________________________________________________________________
concatenate_1 (Concatenate)     (32, 3, 32)          0           zero_padding1d[0][0]
                                                                 bn_2[0][0]          
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D)  (32, 32)             0           concatenate_1[0][0] 
_____________________________________________________________________________________
dense_1 (Dense)                 (32, 1)              33          gap_0[0][0]  

可重现性最低的代码:

# also works with `from keras`
from tensorflow.keras.layers import Input, Conv1D, ZeroPadding1D, concatenate
from tensorflow.keras.layers import BatchNormalization, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np

def make_model(batch_shape):
    ipt = Input(batch_shape=batch_shape)

    x1  = Conv1D(16, 20,  strides=200, padding='same')(ipt)
    x1  = BatchNormalization()(x1)
    x2  = Conv1D(16, 200, strides=120, padding='same')(ipt)
    x2  = BatchNormalization()(x2)

    x1, x2 = zero_pad(x1, x2)
    preout = concatenate([x1, x2])
    preout = GlobalAveragePooling1D()(preout)
    out    = Dense(1)(preout)

    model  = Model(ipt, out)
    model.compile('adam', 'mse')
    return model 

def zero_pad(x1, x2):
    diff = int(x2.shape[1]) - int(x1.shape[1])
    if   diff > 0:
        x1 = ZeroPadding1D((diff, 0))(x1)
    elif diff < 0:
        x2 = ZeroPadding1D((abs(diff), 0))(x2)
    return x1, x2

def make_data(batch_shape):
    return (np.random.randn(*batch_shape), 
            np.random.randint(0, 2, (batch_shape[0], 1)))

batch_shape_A = (32, 240, 4)
batch_shape_B = (32, 250, 4)
batch_shape_C = (32, 240, 4)
model_A  = make_model(batch_shape_A)
model_B  = make_model(batch_shape_B)
model_C  = make_model(batch_shape_C) # 'control group'
x_A, y_A = make_data(batch_shape_A)
x_B, y_B = make_data(batch_shape_B)
x_C, y_C = make_data(batch_shape_C)

model_A.train_on_batch(x_A, y_A)
model_B.train_on_batch(x_B, y_B)
model_C.train_on_batch(x_C, y_C)

optimizer_weights_A = model_A.optimizer.get_weights()

model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")

model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print

输出:

model_C optimizer weights set successfully

ValueError: Optimizer weight shape (16,) not compatible with provided 
weight shape (200, 4, 16)

找到解决方法和一种解释形式;它不是关于 ZeroPadding1D,而是关于在一个 'branch' 而不是另一个中有一个额外的层 - 正如 plot_model() 所揭示的那样;见下文。

Keras 似乎通过 垂直遍历 构建层 - 请注意,编号的层图与 .summary() 顺序完全匹配。在 'branch' 结束时仍可能发生顺序更改 - 我想原因是,在合并到公共层之前,两个分支的层节点应该处于相同的深度。 但是,这不是完整的故事 - 请参阅底部的免责声明。

解决方法:插入一个'pseudolayer'来均衡每个分支中的层数;我会坚持使用 z 填充:

def zero_pad(x1, x2):
    diff = int(x2.shape[1]) - int(x1.shape[1])
    if   diff > 0:
        x1 = ZeroPadding1D((diff, 0))(x1)
        x2 = ZeroPadding1D((0, 0))(x2)
    elif diff < 0:
        x2 = ZeroPadding1D((abs(diff), 0))(x2)
        x1 = ZeroPadding1D((0, 0))(x1)
    return x1, x2

运行题中代码:

model_C optimizer weights set successfully
model_B optimizer weights set successfully  # SUCCESS

模型图:来自from tensorflow.keras.utils import plot_model; plot_model(model_A) ...


解释免责声明:我还没有在源代码的确切行中确认它,并且 .summary() 并不总是与 plot_model() 一致;例如,使用 padding='valid',我们得到上面 model_Amodel_Bmodel_B 图,但摘要显示 model_A 的构建顺序。此外,padding='valid' 在没有修复的情况下工作,因为两个模型最终都使用 ZeroPadding1D,所以层结构(看似)相同。