为什么模型子类化和 TFRecord 的组合不起作用?
Why the combination of model subclassing and TFRecord does not work?
问题的简短版本
为什么当我尝试使用由 TFRecord 保存和加载的数据集来训练通过子类化(在 Keras 中)实现的模型时,它失败了?
问题的完整版本
我有以下模型(首先让我们在其中定义功能API):
def get_model():
input_layer = Input(shape=(6,), name="input")
x = input_layer
x = layers.Dense(128, activation='relu', name="dense_1")(x)
x = layers.Dense(1024, activation='relu', name="dense_2")(x)
x = layers.Dense(5120, activation='relu', name="dense_3")(x)
a_out = layers.Dense(17, activation='softmax', name='a_out')(x)
b_out = layers.Dense(27, activation='softmax', name='b_out')(x)
c_out = layers.Dense(71, activation='softmax', name='c_out')(x)
d_out = layers.Dense(29, activation='softmax', name='d_out')(x)
model = models.Model(input_layer, [a_out, b_out, c_out, d_out])
model.compile(optimizer='rmsprop',
loss=('sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy'))
return model
它接受形状为 (6,) 的张量并输出 4 个不同的输出,a_out
、b_out
、c_out
和 d_out
。每个都是一个整数(分类输出)。接下来我要定义一个 dummy/random 数据集来训练这个模型:
sample_count = 1000
inputs = np.random.rand(sample_count, 6).astype(np.float32)
targets = (
np.random.randint(low=0, high=16, size=(sample_count,), dtype=np.int64),
np.random.randint(low=0, high=26, size=(sample_count,), dtype=np.int64),
np.random.randint(low=0, high=70, size=(sample_count,), dtype=np.int64),
np.random.randint(low=0, high=28, size=(sample_count,), dtype=np.int64)
)
random_dataset = tf.data.Dataset.from_tensor_slices((inputs, targets))
for rec in random_dataset:
print(rec)
break
如果您调用功能性 API 模型的 fit
方法并为其提供此数据集,它将进行良好的训练。此外,前一个代码块中的 print
语句输出如下内容:
(<tf.Tensor: shape=(6,), dtype=float32, numpy=
array([0.326234 , 0.9935627 , 0.65569717, 0.05908937, 0.7490394 ,
0.7929646 ], dtype=float32)>, (<tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([60])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([9])>))
现在,让我们使用 TFRecord 保存和加载相同的数据集:
# Saving the random dataset into a TFRecord file
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
# If the value is an eager tensor BytesList won't unpack a string from an EagerTensor.
if isinstance(value, type(tf.constant(0))):
value = value.numpy()
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
file_path = 'random.tfrec'
with tf.io.TFRecordWriter(file_path) as writer:
for rec in random_dataset:
feature = {
'input': _bytes_feature(tf.io.serialize_tensor(rec[0])),
'a_out': _int64_feature(rec[1][0]),
'b_out': _int64_feature(rec[1][1]),
'c_out': _int64_feature(rec[1][2]),
'd_out': _int64_feature(rec[1][3]),
}
example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example_proto.SerializeToString())
# Load the dataset off the file just created
def read_tfrecord(serialized_example):
feature_description = {
'input': tf.io.FixedLenFeature((), tf.string),
'a_out': tf.io.FixedLenFeature((), tf.int64),
'b_out': tf.io.FixedLenFeature((), tf.int64),
'c_out': tf.io.FixedLenFeature((), tf.int64),
'd_out': tf.io.FixedLenFeature((), tf.int64)
}
example = tf.io.parse_single_example(serialized_example, feature_description)
return tf.io.parse_tensor(example['input'], out_type=tf.float32), (
example["a_out"],
example["b_out"],
example["c_out"],
example["d_out"])
tfrecord_dataset = tf.data.TFRecordDataset(file_path).map(read_tfrecord)
for rec in tfrecord_dataset:
print(rec)
break
最后的打印语句只是完整性检查,以确保数据集在序列化过程中没有被扭曲。它输出类似:
(<tf.Tensor: shape=(6,), dtype=float32, numpy=
array([0.326234 , 0.9935627 , 0.65569717, 0.05908937, 0.7490394 ,
0.7929646 ], dtype=float32)>, (<tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([60])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([9])>))
这在各个方面都与原始数据集相同。如果我将这个 tfrecord_dataset
数据集提供给功能性 API 模型,它仍然可以很好地训练。接下来,我将使用继承定义相同的模型(A.K.A。子类化):
class SubclassModel(keras.Model):
def __init__(self):
super(SubclassModel, self).__init__()
self.d1 = layers.Dense(128, activation='relu', name="dense_1")
self.d2 = layers.Dense(1024, activation='relu', name="dense_2")
self.d3 = layers.Dense(5120, activation='relu', name="dense_3")
self.a_out = layers.Dense(17, activation='softmax', name='a_out')
self.b_out = layers.Dense(27, activation='softmax', name='b_out')
self.c_out = layers.Dense(71, activation='softmax', name='c_out')
self.d_out = layers.Dense(29, activation='softmax', name='d_out')
self.build((None, 6,))
self.compile(optimizer='rmsprop',
loss=('sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy'))
def call(self, inputs, training=True):
x = inputs
x = self.d1(x)
x = self.d2(x)
x = self.d3(x)
a = self.a_out(x)
b = self.b_out(x)
c = self.c_out(x)
d = self.d_out(x)
return a, b, c, d
这是妙语。现在,我有两种不同的方法来创建模型(函数 API 和继承)和两个不同的数据集(random_dataset
和 tfrecord_dataset
)。这构成了四种不同的组合:
- 使用
random_dataset
训练函数 API 模型:工作正常
- 使用
tfrecord_dataset
训练函数 API 模型:工作正常
- 使用
random_dataset
训练 SubclassModel:工作正常
- 使用
tfrecord_dataset
训练 SubclassModel:失败!
这是我遇到的错误(截断):
TypeError: in user code:
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 878, in train_function *
return step_function(self, iterator)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 867, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 860, in run_step **
outputs = model.train_step(data)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 808, in train_step
y_pred = self(x, training=True)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
TypeError: Exception encountered when calling layer "subclass_model_1" (type SubclassModel).
in user code:
File "/tmp/ipykernel_22298/1542980101.py", line 28, in call *
a = self.a_out(x)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
raise e.with_traceback(filtered_tb) from None
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/activations.py", line 78, in softmax
if x.shape.rank > 1:
TypeError: Exception encountered when calling layer "a_out" (type Dense).
'>' not supported between instances of 'NoneType' and 'int'
Call arguments received:
• inputs=tf.Tensor(shape=<unknown>, dtype=float32)
Call arguments received:
• inputs=tf.Tensor(shape=<unknown>, dtype=float32)
• training=True
有谁知道我做错了什么吗?
对于可能面临同样问题的任何其他人,解决方案是在阅读 TFRecords 时重塑张量以匹配它们的预期形状:
def read_tfrecord(serialized_example):
feature_description = {
'input': tf.io.FixedLenFeature((), tf.string),
'a_out': tf.io.FixedLenFeature((), tf.int64),
'b_out': tf.io.FixedLenFeature((), tf.int64),
'c_out': tf.io.FixedLenFeature((), tf.int64),
'd_out': tf.io.FixedLenFeature((), tf.int64)
}
example = tf.io.parse_single_example(serialized_example, feature_description)
return tf.reshape(tf.io.parse_tensor(example['input'], out_type=tf.float32), (6,)), (
example["a_out"],
example["b_out"],
example["c_out"],
example["d_out"])
为什么函数 API 没有抱怨这个但子类有,我无法理解。
问题的简短版本
为什么当我尝试使用由 TFRecord 保存和加载的数据集来训练通过子类化(在 Keras 中)实现的模型时,它失败了?
问题的完整版本
我有以下模型(首先让我们在其中定义功能API):
def get_model():
input_layer = Input(shape=(6,), name="input")
x = input_layer
x = layers.Dense(128, activation='relu', name="dense_1")(x)
x = layers.Dense(1024, activation='relu', name="dense_2")(x)
x = layers.Dense(5120, activation='relu', name="dense_3")(x)
a_out = layers.Dense(17, activation='softmax', name='a_out')(x)
b_out = layers.Dense(27, activation='softmax', name='b_out')(x)
c_out = layers.Dense(71, activation='softmax', name='c_out')(x)
d_out = layers.Dense(29, activation='softmax', name='d_out')(x)
model = models.Model(input_layer, [a_out, b_out, c_out, d_out])
model.compile(optimizer='rmsprop',
loss=('sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy'))
return model
它接受形状为 (6,) 的张量并输出 4 个不同的输出,a_out
、b_out
、c_out
和 d_out
。每个都是一个整数(分类输出)。接下来我要定义一个 dummy/random 数据集来训练这个模型:
sample_count = 1000
inputs = np.random.rand(sample_count, 6).astype(np.float32)
targets = (
np.random.randint(low=0, high=16, size=(sample_count,), dtype=np.int64),
np.random.randint(low=0, high=26, size=(sample_count,), dtype=np.int64),
np.random.randint(low=0, high=70, size=(sample_count,), dtype=np.int64),
np.random.randint(low=0, high=28, size=(sample_count,), dtype=np.int64)
)
random_dataset = tf.data.Dataset.from_tensor_slices((inputs, targets))
for rec in random_dataset:
print(rec)
break
如果您调用功能性 API 模型的 fit
方法并为其提供此数据集,它将进行良好的训练。此外,前一个代码块中的 print
语句输出如下内容:
(<tf.Tensor: shape=(6,), dtype=float32, numpy=
array([0.326234 , 0.9935627 , 0.65569717, 0.05908937, 0.7490394 ,
0.7929646 ], dtype=float32)>, (<tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([60])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([9])>))
现在,让我们使用 TFRecord 保存和加载相同的数据集:
# Saving the random dataset into a TFRecord file
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
# If the value is an eager tensor BytesList won't unpack a string from an EagerTensor.
if isinstance(value, type(tf.constant(0))):
value = value.numpy()
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
file_path = 'random.tfrec'
with tf.io.TFRecordWriter(file_path) as writer:
for rec in random_dataset:
feature = {
'input': _bytes_feature(tf.io.serialize_tensor(rec[0])),
'a_out': _int64_feature(rec[1][0]),
'b_out': _int64_feature(rec[1][1]),
'c_out': _int64_feature(rec[1][2]),
'd_out': _int64_feature(rec[1][3]),
}
example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example_proto.SerializeToString())
# Load the dataset off the file just created
def read_tfrecord(serialized_example):
feature_description = {
'input': tf.io.FixedLenFeature((), tf.string),
'a_out': tf.io.FixedLenFeature((), tf.int64),
'b_out': tf.io.FixedLenFeature((), tf.int64),
'c_out': tf.io.FixedLenFeature((), tf.int64),
'd_out': tf.io.FixedLenFeature((), tf.int64)
}
example = tf.io.parse_single_example(serialized_example, feature_description)
return tf.io.parse_tensor(example['input'], out_type=tf.float32), (
example["a_out"],
example["b_out"],
example["c_out"],
example["d_out"])
tfrecord_dataset = tf.data.TFRecordDataset(file_path).map(read_tfrecord)
for rec in tfrecord_dataset:
print(rec)
break
最后的打印语句只是完整性检查,以确保数据集在序列化过程中没有被扭曲。它输出类似:
(<tf.Tensor: shape=(6,), dtype=float32, numpy=
array([0.326234 , 0.9935627 , 0.65569717, 0.05908937, 0.7490394 ,
0.7929646 ], dtype=float32)>, (<tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([60])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([9])>))
这在各个方面都与原始数据集相同。如果我将这个 tfrecord_dataset
数据集提供给功能性 API 模型,它仍然可以很好地训练。接下来,我将使用继承定义相同的模型(A.K.A。子类化):
class SubclassModel(keras.Model):
def __init__(self):
super(SubclassModel, self).__init__()
self.d1 = layers.Dense(128, activation='relu', name="dense_1")
self.d2 = layers.Dense(1024, activation='relu', name="dense_2")
self.d3 = layers.Dense(5120, activation='relu', name="dense_3")
self.a_out = layers.Dense(17, activation='softmax', name='a_out')
self.b_out = layers.Dense(27, activation='softmax', name='b_out')
self.c_out = layers.Dense(71, activation='softmax', name='c_out')
self.d_out = layers.Dense(29, activation='softmax', name='d_out')
self.build((None, 6,))
self.compile(optimizer='rmsprop',
loss=('sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy',
'sparse_categorical_crossentropy'))
def call(self, inputs, training=True):
x = inputs
x = self.d1(x)
x = self.d2(x)
x = self.d3(x)
a = self.a_out(x)
b = self.b_out(x)
c = self.c_out(x)
d = self.d_out(x)
return a, b, c, d
这是妙语。现在,我有两种不同的方法来创建模型(函数 API 和继承)和两个不同的数据集(random_dataset
和 tfrecord_dataset
)。这构成了四种不同的组合:
- 使用
random_dataset
训练函数 API 模型:工作正常 - 使用
tfrecord_dataset
训练函数 API 模型:工作正常 - 使用
random_dataset
训练 SubclassModel:工作正常 - 使用
tfrecord_dataset
训练 SubclassModel:失败!
这是我遇到的错误(截断):
TypeError: in user code:
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 878, in train_function *
return step_function(self, iterator)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 867, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 860, in run_step **
outputs = model.train_step(data)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/engine/training.py", line 808, in train_step
y_pred = self(x, training=True)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
TypeError: Exception encountered when calling layer "subclass_model_1" (type SubclassModel).
in user code:
File "/tmp/ipykernel_22298/1542980101.py", line 28, in call *
a = self.a_out(x)
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler **
raise e.with_traceback(filtered_tb) from None
File "/home/mehran/.pyenv/versions/3.8.12/envs/jupyter/lib/python3.8/site-packages/keras/activations.py", line 78, in softmax
if x.shape.rank > 1:
TypeError: Exception encountered when calling layer "a_out" (type Dense).
'>' not supported between instances of 'NoneType' and 'int'
Call arguments received:
• inputs=tf.Tensor(shape=<unknown>, dtype=float32)
Call arguments received:
• inputs=tf.Tensor(shape=<unknown>, dtype=float32)
• training=True
有谁知道我做错了什么吗?
对于可能面临同样问题的任何其他人,解决方案是在阅读 TFRecords 时重塑张量以匹配它们的预期形状:
def read_tfrecord(serialized_example):
feature_description = {
'input': tf.io.FixedLenFeature((), tf.string),
'a_out': tf.io.FixedLenFeature((), tf.int64),
'b_out': tf.io.FixedLenFeature((), tf.int64),
'c_out': tf.io.FixedLenFeature((), tf.int64),
'd_out': tf.io.FixedLenFeature((), tf.int64)
}
example = tf.io.parse_single_example(serialized_example, feature_description)
return tf.reshape(tf.io.parse_tensor(example['input'], out_type=tf.float32), (6,)), (
example["a_out"],
example["b_out"],
example["c_out"],
example["d_out"])
为什么函数 API 没有抱怨这个但子类有,我无法理解。