添加无参数层后无法加载优化器权重
Can't load optimizer weights after adding layer without parameters
模型 A:
ipt = Input(batch_shape=(32, 240, 4))
x1 = Conv1D(16, 20, strides=200, padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2 = BatchNormalization()(x2) # ...
模型 B:
ipt = Input(batch_shape=(32, 250, 4))
x1 = Conv1D(16, 20, strides=200)(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120)(ipt)
x2 = BatchNormalization()(x2) # ...
两者具有相同的权重形状 - 但是,A 的优化器 weights
不能 加载到 B 上,因为 B 具有不同的构建顺序(下面的图像和代码)。
这是一个更大模型的一小段,它需要每 X 个时期更改其 timesteps
参数,并且 ZeroPadding1D
似乎 更改层构建顺序 每当它被使用时;这不会影响 model 权重,因为它们是通过字典映射的——而优化器权重是按顺序映射的,列表到列表。
可在 TF1 和 TF2 中重现,并带有 keras
和 tf.keras
导入。有什么问题,如何解决? Relevant Git
环境:Win-10 OS、CUDA 10.0.130、cuDNN 7.6.0、Python 3.7.4、GTX 1070
观察结果:
- 交换任何其他层,而不仅仅是
BatchNormalization
- 以及 concatenate
之前的任何 层数 ;优化器权重最终被简单地交换为 .get_weights()
- 可以更改
strides
而不是 batch_shape[1]
- 可以使用
MaxPooling1D
和 strides > 1
padding='valid'
导致 ZeroPadding1D
,但它 不会 更改构建顺序(不知道为什么)
model_A.summary()
:
Layer (type) Output Shape Param # Connected to
==================================================================================
input_1 (InputLayer) [(32, 240, 4)] 0
__________________________________________________________________________________
conv1d (Conv1D) (32, 2, 16) 1296 input_1[0][0]
__________________________________________________________________________________
conv1d_1 (Conv1D) (32, 2, 16) 12816 input_1[0][0]
__________________________________________________________________________________
bn_1 (BatchNormalization) (32, 2, 16) 64 conv1d[0][0]
__________________________________________________________________________________
bn_2 (BatchNormalization) (32, 2, 16) 64 conv1d_1[0][0]
__________________________________________________________________________________
concatenate (Concatenate) (32, 2, 32) 0 bn_1[0][0]
bn_2[0][0]
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32, 32) 0 concatenate[0][0]
__________________________________________________________________________________
dense (Dense) (32, 1) 33 gap_0[0][0]
model_B.summary()
(注意交换层)
input_2 (InputLayer) [(32, 250, 4)] 0
_____________________________________________________________________________________
conv1d_2 (Conv1D) (32, 2, 16) 1296 input_2[0][0]
_____________________________________________________________________________________
bn_1 (BatchNormalization) (32, 2, 16) 64 conv1d_2[0][0]
_____________________________________________________________________________________
conv1d_3 (Conv1D) (32, 3, 16) 12816 input_2[0][0]
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D) (32, 3, 16) 0 bn_1[0][0]
_____________________________________________________________________________________
bn_2 (BatchNormalization) (32, 3, 16) 64 conv1d_3[0][0]
_____________________________________________________________________________________
concatenate_1 (Concatenate) (32, 3, 32) 0 zero_padding1d[0][0]
bn_2[0][0]
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32, 32) 0 concatenate_1[0][0]
_____________________________________________________________________________________
dense_1 (Dense) (32, 1) 33 gap_0[0][0]
可重现性最低的代码:
# also works with `from keras`
from tensorflow.keras.layers import Input, Conv1D, ZeroPadding1D, concatenate
from tensorflow.keras.layers import BatchNormalization, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x1 = Conv1D(16, 20, strides=200, padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2 = BatchNormalization()(x2)
x1, x2 = zero_pad(x1, x2)
preout = concatenate([x1, x2])
preout = GlobalAveragePooling1D()(preout)
out = Dense(1)(preout)
model = Model(ipt, out)
model.compile('adam', 'mse')
return model
def zero_pad(x1, x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff, 0))(x1)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff), 0))(x2)
return x1, x2
def make_data(batch_shape):
return (np.random.randn(*batch_shape),
np.random.randint(0, 2, (batch_shape[0], 1)))
batch_shape_A = (32, 240, 4)
batch_shape_B = (32, 250, 4)
batch_shape_C = (32, 240, 4)
model_A = make_model(batch_shape_A)
model_B = make_model(batch_shape_B)
model_C = make_model(batch_shape_C) # 'control group'
x_A, y_A = make_data(batch_shape_A)
x_B, y_B = make_data(batch_shape_B)
x_C, y_C = make_data(batch_shape_C)
model_A.train_on_batch(x_A, y_A)
model_B.train_on_batch(x_B, y_B)
model_C.train_on_batch(x_C, y_C)
optimizer_weights_A = model_A.optimizer.get_weights()
model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")
model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print
输出:
model_C optimizer weights set successfully
ValueError: Optimizer weight shape (16,) not compatible with provided
weight shape (200, 4, 16)
找到解决方法和一种解释形式;它不是关于 ZeroPadding1D
,而是关于在一个 'branch' 而不是另一个中有一个额外的层 - 正如 plot_model()
所揭示的那样;见下文。
Keras 似乎通过 垂直遍历 构建层 - 请注意,编号的层图与 .summary()
顺序完全匹配。在 'branch' 结束时仍可能发生顺序更改 - 我想原因是,在合并到公共层之前,两个分支的层节点应该处于相同的深度。 但是,这不是完整的故事 - 请参阅底部的免责声明。
解决方法:插入一个'pseudolayer'来均衡每个分支中的层数;我会坚持使用 z 填充:
def zero_pad(x1, x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff, 0))(x1)
x2 = ZeroPadding1D((0, 0))(x2)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff), 0))(x2)
x1 = ZeroPadding1D((0, 0))(x1)
return x1, x2
运行题中代码:
model_C optimizer weights set successfully
model_B optimizer weights set successfully # SUCCESS
模型图:来自from tensorflow.keras.utils import plot_model; plot_model(model_A) ...
解释免责声明:我还没有在源代码的确切行中确认它,并且 .summary()
并不总是与 plot_model()
一致;例如,使用 padding='valid'
,我们得到上面 model_A
和 model_B
的 model_B
图,但摘要显示 model_A
的构建顺序。此外,padding='valid'
在没有修复的情况下工作,因为两个模型最终都使用 ZeroPadding1D
,所以层结构(看似)相同。
模型 A:
ipt = Input(batch_shape=(32, 240, 4))
x1 = Conv1D(16, 20, strides=200, padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2 = BatchNormalization()(x2) # ...
模型 B:
ipt = Input(batch_shape=(32, 250, 4))
x1 = Conv1D(16, 20, strides=200)(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120)(ipt)
x2 = BatchNormalization()(x2) # ...
两者具有相同的权重形状 - 但是,A 的优化器
weights
不能 加载到 B 上,因为 B 具有不同的构建顺序(下面的图像和代码)。
这是一个更大模型的一小段,它需要每 X 个时期更改其 timesteps
参数,并且 ZeroPadding1D
似乎 更改层构建顺序 每当它被使用时;这不会影响 model 权重,因为它们是通过字典映射的——而优化器权重是按顺序映射的,列表到列表。
可在 TF1 和 TF2 中重现,并带有 keras
和 tf.keras
导入。有什么问题,如何解决? Relevant Git
环境:Win-10 OS、CUDA 10.0.130、cuDNN 7.6.0、Python 3.7.4、GTX 1070
观察结果:
- 交换任何其他层,而不仅仅是
BatchNormalization
- 以及concatenate
之前的任何 层数 ;优化器权重最终被简单地交换为.get_weights()
- 可以更改
strides
而不是batch_shape[1]
- 可以使用
MaxPooling1D
和strides > 1
padding='valid'
导致ZeroPadding1D
,但它 不会 更改构建顺序(不知道为什么)
model_A.summary()
:
Layer (type) Output Shape Param # Connected to
==================================================================================
input_1 (InputLayer) [(32, 240, 4)] 0
__________________________________________________________________________________
conv1d (Conv1D) (32, 2, 16) 1296 input_1[0][0]
__________________________________________________________________________________
conv1d_1 (Conv1D) (32, 2, 16) 12816 input_1[0][0]
__________________________________________________________________________________
bn_1 (BatchNormalization) (32, 2, 16) 64 conv1d[0][0]
__________________________________________________________________________________
bn_2 (BatchNormalization) (32, 2, 16) 64 conv1d_1[0][0]
__________________________________________________________________________________
concatenate (Concatenate) (32, 2, 32) 0 bn_1[0][0]
bn_2[0][0]
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32, 32) 0 concatenate[0][0]
__________________________________________________________________________________
dense (Dense) (32, 1) 33 gap_0[0][0]
model_B.summary()
(注意交换层)
input_2 (InputLayer) [(32, 250, 4)] 0
_____________________________________________________________________________________
conv1d_2 (Conv1D) (32, 2, 16) 1296 input_2[0][0]
_____________________________________________________________________________________
bn_1 (BatchNormalization) (32, 2, 16) 64 conv1d_2[0][0]
_____________________________________________________________________________________
conv1d_3 (Conv1D) (32, 3, 16) 12816 input_2[0][0]
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D) (32, 3, 16) 0 bn_1[0][0]
_____________________________________________________________________________________
bn_2 (BatchNormalization) (32, 3, 16) 64 conv1d_3[0][0]
_____________________________________________________________________________________
concatenate_1 (Concatenate) (32, 3, 32) 0 zero_padding1d[0][0]
bn_2[0][0]
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32, 32) 0 concatenate_1[0][0]
_____________________________________________________________________________________
dense_1 (Dense) (32, 1) 33 gap_0[0][0]
可重现性最低的代码:
# also works with `from keras`
from tensorflow.keras.layers import Input, Conv1D, ZeroPadding1D, concatenate
from tensorflow.keras.layers import BatchNormalization, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x1 = Conv1D(16, 20, strides=200, padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2 = BatchNormalization()(x2)
x1, x2 = zero_pad(x1, x2)
preout = concatenate([x1, x2])
preout = GlobalAveragePooling1D()(preout)
out = Dense(1)(preout)
model = Model(ipt, out)
model.compile('adam', 'mse')
return model
def zero_pad(x1, x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff, 0))(x1)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff), 0))(x2)
return x1, x2
def make_data(batch_shape):
return (np.random.randn(*batch_shape),
np.random.randint(0, 2, (batch_shape[0], 1)))
batch_shape_A = (32, 240, 4)
batch_shape_B = (32, 250, 4)
batch_shape_C = (32, 240, 4)
model_A = make_model(batch_shape_A)
model_B = make_model(batch_shape_B)
model_C = make_model(batch_shape_C) # 'control group'
x_A, y_A = make_data(batch_shape_A)
x_B, y_B = make_data(batch_shape_B)
x_C, y_C = make_data(batch_shape_C)
model_A.train_on_batch(x_A, y_A)
model_B.train_on_batch(x_B, y_B)
model_C.train_on_batch(x_C, y_C)
optimizer_weights_A = model_A.optimizer.get_weights()
model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")
model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print
输出:
model_C optimizer weights set successfully
ValueError: Optimizer weight shape (16,) not compatible with provided
weight shape (200, 4, 16)
找到解决方法和一种解释形式;它不是关于 ZeroPadding1D
,而是关于在一个 'branch' 而不是另一个中有一个额外的层 - 正如 plot_model()
所揭示的那样;见下文。
Keras 似乎通过 垂直遍历 构建层 - 请注意,编号的层图与 .summary()
顺序完全匹配。在 'branch' 结束时仍可能发生顺序更改 - 我想原因是,在合并到公共层之前,两个分支的层节点应该处于相同的深度。 但是,这不是完整的故事 - 请参阅底部的免责声明。
解决方法:插入一个'pseudolayer'来均衡每个分支中的层数;我会坚持使用 z 填充:
def zero_pad(x1, x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff, 0))(x1)
x2 = ZeroPadding1D((0, 0))(x2)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff), 0))(x2)
x1 = ZeroPadding1D((0, 0))(x1)
return x1, x2
运行题中代码:
model_C optimizer weights set successfully
model_B optimizer weights set successfully # SUCCESS
模型图:来自from tensorflow.keras.utils import plot_model; plot_model(model_A) ...
解释免责声明:我还没有在源代码的确切行中确认它,并且 .summary()
并不总是与 plot_model()
一致;例如,使用 padding='valid'
,我们得到上面 model_A
和 model_B
的 model_B
图,但摘要显示 model_A
的构建顺序。此外,padding='valid'
在没有修复的情况下工作,因为两个模型最终都使用 ZeroPadding1D
,所以层结构(看似)相同。