在 tensorflow 2.0 中加载 tfrecord 文件时出错
Get error when load tfrecord file in tensorflow 2.0
我正在尝试将 WAV 文件转换为 TFRecord 碎片
在下面的代码中,我使用 tf.audio.decode_wav 从 wav 文件中获取音频信号,标签是句子的索引列表。
然后我将所有的wav文件和标签保存到train.tfrecord中并拆分它
def _write_tfrecord_file(self, shard_data):
shard_path, indices = shard_data
with tf.io.TFRecordWriter(shard_path, options='ZLIB') as out:
for index in indices:
file_path = self.data_dir + self.df['Filename'][index] + ".wav"
label = str2index(self.df['Text'][index])
raw_audio = tf.io.read_file(file_path)
audio, sample_rate = tf.audio.decode_wav(
raw_audio,
desired_channels=1, # mono
desired_samples=self.sample_rate * self.duration)
example = tf.train.Example(features=tf.train.Features(feature={
'audio': _float_feature(audio.numpy().flatten().tolist()),
'label': _int64_feature(label)}))
out.write(example.SerializeToString())
然后,我写了一个函数来加载
def _parse_batch(record_batch, sample_rate, duration):
n_sample = sample_rate * duration
feature_description = {
'audio': tf.io.FixedLenFeature([n_sample], tf.float32),
'label': tf.io.FixedLenFeature([], tf.int64)
}
example = tf.io.parse_example(record_batch, feature_description)
return example['audio'], example['label']
def get_dataset_from_tfrecords(tfrecords_dir='tfrecords', split='train', batch_size=16, sample_rate=44100, duration=5,
n_epochs=10):
if split not in ('train', 'validate'):
raise ValueError("Split must be either 'train' or 'validate'")
pattern = os.path.join(tfrecords_dir, '{}*.tfrecord'.format(split))
files_ds = tf.data.Dataset.list_files(pattern)
ignore_order = tf.data.Options()
ignore_order.experimental_deterministic = False
files_ds = files_ds.with_options(ignore_order)
ds = tf.data.TFRecordDataset(files_ds, compression_type='ZLIB')
ds.batch(batch_size)
ds = ds.map(lambda x: _parse_batch(x, sample_rate, duration))
if split == 'train':
ds.repeat(n_epochs)
return ds.prefetch(buffer_size=AUTOTUNE)
但是我得到一个错误
ValueError: in converted code:
D:\Natural Language Processing\speech_to_text\utils\load_tfrecord.py:38 None *
ds = ds.map(lambda x: _parse_batch(x, sample_rate, duration))
D:\Natural Language Processing\speech_to_text\utils\load_tfrecord.py:16 _parse_batch *
example = tf.io.parse_example(record_batch, feature_description)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\ops\parsing_ops.py:807 parse_example_v2
dense_types, dense_defaults, dense_shapes, name)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\ops\parsing_ops.py:868 _parse_example_raw
name=name)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_parsing_ops.py:626 parse_example
name=name)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py:793 _apply_op_helper
op_def=op_def)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\func_graph.py:548 create_op
compute_device)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:3429 _create_op_internal
op_def=op_def)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1773 __init__
control_input_ops)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1613 _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'ParseExample/ParseExample' (op: 'ParseExample') with input shapes: [], [0], [], [], [], [].
我该如何解决这个问题?
当传递给 ParseExample
的形状在序列化示例和功能描述之间不匹配时,会发生此错误。
我能够使用下面的代码重现您的错误,其中 serialized_tf_example
和 feature_configs
之间的形状不匹配,因此会引发错误。
重现错误的代码 -
import tensorflow as tf
sess = tf.InteractiveSession()
serialized_tf_example = tf.placeholder(tf.string, shape=[], name='serialized_tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[1], dtype=tf.float32)}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
feature_dict = {'x': tf.train.Feature(float_list=tf.train.FloatList(value=[25]))}
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
f = example.SerializeToString()
sess.run(tf_example,feed_dict={serialized_tf_example:[f]})
输出-
/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).
warnings.warn('An interactive session is already active. This can '
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
1606 try:
-> 1607 c_op = c_api.TF_FinishOperation(op_desc)
1608 except errors.InvalidArgumentError as e:
InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'ParseExample_16/ParseExample' (op: 'ParseExample') with input shapes: [], [0], [], [0].
再比如,当期望的输入类型和传递的数据也不同时,也会出现错误。例如,如果我修改 tf_example = tf.parse_example([serialized_tf_example], feature_configs)
那么我们会按预期得到 InvalidArgumentError: Shape must be rank 1 but is rank 2 for 'ParseExample_21/ParseExample' (op: 'ParseExample')
错误。
当我们将正确的形状传递给 serialized_tf_example
和 feature_configs
时,错误已修复。
固定代码-
import tensorflow as tf
sess = tf.InteractiveSession()
serialized_tf_example = tf.placeholder(tf.string, shape=[1], name='serialized_tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[1], dtype=tf.float32)}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
feature_dict = {'x': tf.train.Feature(float_list=tf.train.FloatList(value=[25]))}
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
f = example.SerializeToString()
sess.run(tf_example,feed_dict={serialized_tf_example:[f]})
输出-
{'x': array([[25.]], dtype=float32)}
希望这能回答您的问题。快乐学习。
我正在尝试将 WAV 文件转换为 TFRecord 碎片
在下面的代码中,我使用 tf.audio.decode_wav 从 wav 文件中获取音频信号,标签是句子的索引列表。
然后我将所有的wav文件和标签保存到train.tfrecord中并拆分它
def _write_tfrecord_file(self, shard_data):
shard_path, indices = shard_data
with tf.io.TFRecordWriter(shard_path, options='ZLIB') as out:
for index in indices:
file_path = self.data_dir + self.df['Filename'][index] + ".wav"
label = str2index(self.df['Text'][index])
raw_audio = tf.io.read_file(file_path)
audio, sample_rate = tf.audio.decode_wav(
raw_audio,
desired_channels=1, # mono
desired_samples=self.sample_rate * self.duration)
example = tf.train.Example(features=tf.train.Features(feature={
'audio': _float_feature(audio.numpy().flatten().tolist()),
'label': _int64_feature(label)}))
out.write(example.SerializeToString())
然后,我写了一个函数来加载
def _parse_batch(record_batch, sample_rate, duration):
n_sample = sample_rate * duration
feature_description = {
'audio': tf.io.FixedLenFeature([n_sample], tf.float32),
'label': tf.io.FixedLenFeature([], tf.int64)
}
example = tf.io.parse_example(record_batch, feature_description)
return example['audio'], example['label']
def get_dataset_from_tfrecords(tfrecords_dir='tfrecords', split='train', batch_size=16, sample_rate=44100, duration=5,
n_epochs=10):
if split not in ('train', 'validate'):
raise ValueError("Split must be either 'train' or 'validate'")
pattern = os.path.join(tfrecords_dir, '{}*.tfrecord'.format(split))
files_ds = tf.data.Dataset.list_files(pattern)
ignore_order = tf.data.Options()
ignore_order.experimental_deterministic = False
files_ds = files_ds.with_options(ignore_order)
ds = tf.data.TFRecordDataset(files_ds, compression_type='ZLIB')
ds.batch(batch_size)
ds = ds.map(lambda x: _parse_batch(x, sample_rate, duration))
if split == 'train':
ds.repeat(n_epochs)
return ds.prefetch(buffer_size=AUTOTUNE)
但是我得到一个错误
ValueError: in converted code:
D:\Natural Language Processing\speech_to_text\utils\load_tfrecord.py:38 None *
ds = ds.map(lambda x: _parse_batch(x, sample_rate, duration))
D:\Natural Language Processing\speech_to_text\utils\load_tfrecord.py:16 _parse_batch *
example = tf.io.parse_example(record_batch, feature_description)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\ops\parsing_ops.py:807 parse_example_v2
dense_types, dense_defaults, dense_shapes, name)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\ops\parsing_ops.py:868 _parse_example_raw
name=name)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\ops\gen_parsing_ops.py:626 parse_example
name=name)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\op_def_library.py:793 _apply_op_helper
op_def=op_def)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\func_graph.py:548 create_op
compute_device)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:3429 _create_op_internal
op_def=op_def)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1773 __init__
control_input_ops)
C:\Users\levan\Anaconda3\lib\site-packages\tensorflow_core\python\framework\ops.py:1613 _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 1 but is rank 0 for 'ParseExample/ParseExample' (op: 'ParseExample') with input shapes: [], [0], [], [], [], [].
我该如何解决这个问题?
当传递给 ParseExample
的形状在序列化示例和功能描述之间不匹配时,会发生此错误。
我能够使用下面的代码重现您的错误,其中 serialized_tf_example
和 feature_configs
之间的形状不匹配,因此会引发错误。
重现错误的代码 -
import tensorflow as tf
sess = tf.InteractiveSession()
serialized_tf_example = tf.placeholder(tf.string, shape=[], name='serialized_tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[1], dtype=tf.float32)}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
feature_dict = {'x': tf.train.Feature(float_list=tf.train.FloatList(value=[25]))}
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
f = example.SerializeToString()
sess.run(tf_example,feed_dict={serialized_tf_example:[f]})
输出-
/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).
warnings.warn('An interactive session is already active. This can '
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
1606 try:
-> 1607 c_op = c_api.TF_FinishOperation(op_desc)
1608 except errors.InvalidArgumentError as e:
InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'ParseExample_16/ParseExample' (op: 'ParseExample') with input shapes: [], [0], [], [0].
再比如,当期望的输入类型和传递的数据也不同时,也会出现错误。例如,如果我修改 tf_example = tf.parse_example([serialized_tf_example], feature_configs)
那么我们会按预期得到 InvalidArgumentError: Shape must be rank 1 but is rank 2 for 'ParseExample_21/ParseExample' (op: 'ParseExample')
错误。
当我们将正确的形状传递给 serialized_tf_example
和 feature_configs
时,错误已修复。
固定代码-
import tensorflow as tf
sess = tf.InteractiveSession()
serialized_tf_example = tf.placeholder(tf.string, shape=[1], name='serialized_tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[1], dtype=tf.float32)}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
feature_dict = {'x': tf.train.Feature(float_list=tf.train.FloatList(value=[25]))}
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
f = example.SerializeToString()
sess.run(tf_example,feed_dict={serialized_tf_example:[f]})
输出-
{'x': array([[25.]], dtype=float32)}
希望这能回答您的问题。快乐学习。