视频的 TFRecords
TFRecords for videos
我正在尝试从自定义视频数据集创建 TFRecords,但我无法完全理解如何设置它们。
为了准备存储数据,我编写了一个脚本,针对给定的视频源,输出形状为 [N_FRAMES, WIDTH, HEIGHT, CHANNEL]
的 3D 立方体。此后我创建一个 tfrecord 如下:
def _int64_feature(self, value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(self, value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def createDataRecord(self, file_name, locations, categories):
writer = tf.python_io.TFRecordWriter(file_name)
feature = {}
for loc, category in zip(locations, categories):
data = self.3DVideo(loc) # the final array of shape [N_FRAMES, WIDTH, HEIGHT, CHANNEL]
feature['height'] = self._int64_feature(self.height)
feature['width'] = self._int64_feature(self.width)
feature['depth'] = self._int64_feature(self.depth)
feature['data'] = self._bytes_feature(data.tostring())
feature['category'] = self._int64_feature(category)
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
writer.close()
然后我当前的解析器函数看起来像这样
def readDataRecord(self, record):
filename_queue = tf.train.string_input_producer([record], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
feature =
{'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'data': tf.FixedLenFeature([], tf.string),
'category': tf.FixedLenFeature([], tf.int64),
}
example = tf.parse_single_example(serialized_example, features=feature)
video3D_buffer = tf.reshape(example['data'], shape=[])
video3D = tf.decode_raw(video3D_buffer, tf.uint8)
label = tf.cast(example['category'], tf.int32)
return video3D, label
话虽如此,我的问题是:
我知道 readDataRecord()
是错误的,因为它适用于单个帧。我如何准确地将其添加到 return 形状 [N_FRAMES, WIDTH, HEIGHT, CHANNEL]
的单个 3D 立方体及其各自的类别?
简单地保存整个 3D 立方体是个好主意吗?
任何帮助或指导将不胜感激:)
PS: 我研究了其他方法,包括 video2tfrecord 但大多数方法似乎都为每个方法保存了单独的帧视频,我不想要那个。
所以这就是我最终所做的,无需对单个帧进行编码。
我最终将立方体展平,然后将其写出来,如下所示:
def _cube_feature(self, value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
def createDataRecord(self, name, locations, categories):
writer = tf.python_io.TFRecordWriter(name)
feature = {}
for loc, category in zip(locations, categories):
data = self.3DVideo(loc)
.............
feature['data'] = self._cube_feature(data.flatten())
feature['category'] = self._int64_feature(category)
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
writer.close()
生成的解析器是:
def readDataRecord(self, record):
..........
feature = \
{'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'data': tf.FixedLenFeature((NUM_FRAMES, WIDTH, HEIGHT, CHANNEL), tf.float32),
'category': tf.FixedLenFeature([], tf.int64),
}
example = tf.parse_single_example(serialized_example, features=feature)
cube = tf.cast(example['data'], tf.uint8)
label = tf.cast(example['category'], tf.int32)
return cube, label
已接受答案的缺点是您必须在某处存储数组的维度(NUM_FRAMES、WIDTH、HEIGHT、CHANNEL)。解决方法是使用 tf.io.serialize_tensor(array.astype(...))
序列化整个 3D 立方体,将其作为字节字符串特征保存到 TFRecord,然后(在加载 TFRecord 之后)使用 tf.io.parse_tensor(bytestring_array_feature, out_type=...)
恢复它。在这里看到一个很好的解释:(向下滚动到关于 _bytes_feature
的段落)
我正在尝试从自定义视频数据集创建 TFRecords,但我无法完全理解如何设置它们。
为了准备存储数据,我编写了一个脚本,针对给定的视频源,输出形状为 [N_FRAMES, WIDTH, HEIGHT, CHANNEL]
的 3D 立方体。此后我创建一个 tfrecord 如下:
def _int64_feature(self, value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(self, value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def createDataRecord(self, file_name, locations, categories):
writer = tf.python_io.TFRecordWriter(file_name)
feature = {}
for loc, category in zip(locations, categories):
data = self.3DVideo(loc) # the final array of shape [N_FRAMES, WIDTH, HEIGHT, CHANNEL]
feature['height'] = self._int64_feature(self.height)
feature['width'] = self._int64_feature(self.width)
feature['depth'] = self._int64_feature(self.depth)
feature['data'] = self._bytes_feature(data.tostring())
feature['category'] = self._int64_feature(category)
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
writer.close()
然后我当前的解析器函数看起来像这样
def readDataRecord(self, record):
filename_queue = tf.train.string_input_producer([record], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
feature =
{'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'data': tf.FixedLenFeature([], tf.string),
'category': tf.FixedLenFeature([], tf.int64),
}
example = tf.parse_single_example(serialized_example, features=feature)
video3D_buffer = tf.reshape(example['data'], shape=[])
video3D = tf.decode_raw(video3D_buffer, tf.uint8)
label = tf.cast(example['category'], tf.int32)
return video3D, label
话虽如此,我的问题是:
我知道
readDataRecord()
是错误的,因为它适用于单个帧。我如何准确地将其添加到 return 形状[N_FRAMES, WIDTH, HEIGHT, CHANNEL]
的单个 3D 立方体及其各自的类别?简单地保存整个 3D 立方体是个好主意吗?
任何帮助或指导将不胜感激:)
PS: 我研究了其他方法,包括 video2tfrecord 但大多数方法似乎都为每个方法保存了单独的帧视频,我不想要那个。
所以这就是我最终所做的,无需对单个帧进行编码。
我最终将立方体展平,然后将其写出来,如下所示:
def _cube_feature(self, value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
def createDataRecord(self, name, locations, categories):
writer = tf.python_io.TFRecordWriter(name)
feature = {}
for loc, category in zip(locations, categories):
data = self.3DVideo(loc)
.............
feature['data'] = self._cube_feature(data.flatten())
feature['category'] = self._int64_feature(category)
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
writer.close()
生成的解析器是:
def readDataRecord(self, record):
..........
feature = \
{'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'data': tf.FixedLenFeature((NUM_FRAMES, WIDTH, HEIGHT, CHANNEL), tf.float32),
'category': tf.FixedLenFeature([], tf.int64),
}
example = tf.parse_single_example(serialized_example, features=feature)
cube = tf.cast(example['data'], tf.uint8)
label = tf.cast(example['category'], tf.int32)
return cube, label
已接受答案的缺点是您必须在某处存储数组的维度(NUM_FRAMES、WIDTH、HEIGHT、CHANNEL)。解决方法是使用 tf.io.serialize_tensor(array.astype(...))
序列化整个 3D 立方体,将其作为字节字符串特征保存到 TFRecord,然后(在加载 TFRecord 之后)使用 tf.io.parse_tensor(bytestring_array_feature, out_type=...)
恢复它。在这里看到一个很好的解释:(向下滚动到关于 _bytes_feature
的段落)