填充洗牌缓冲区(这可能需要一段时间)
Filling up shuffle buffer (this may take a while)
我有一个数据集,其中包括部分视频帧,1000 个真实视频和 1000 个深度假视频。预处理阶段后的每个视频都转换为其他世界的 300 帧我有一个数据集,其中包含 300000 个带有 Real(0) 标签的图像和 300000 个带有 Fake(1) 标签的图像。
我想用这些数据训练 MesoNet。我使用 costum DataGenerator class 以 0.8,0.1,0.1 的比率处理训练、验证、测试数据,但是当我 运行 项目显示此消息时:
Filling up shuffle buffer (this may take a while):
我该怎么做才能解决这个问题?
您可以在下面看到数据生成器class。
class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, df, labels, batch_size =32, img_size = (224,224),
n_classes = 2, shuffle=True):
'Initialization'
self.batch_size = batch_size
self.labels = labels
self.df = df
self.img_size = img_size
self.n_classes = n_classes
self.shuffle = shuffle
self.batch_labels = []
self.batch_names = []
self.on_epoch_end()
def __len__(self):
'Denotes the number of batches per epoch'
return int(np.floor(len(self.df) / self.batch_size))
def __getitem__(self, index):
batch_index = self.indexes[index * self.batch_size : (index + 1) * self.batch_size]
frame_paths = self.df.iloc[batch_index]["framePath"].values
frame_label = self.df.iloc[batch_index]["label"].values
imgs = [cv2.imread(frame) for frame in frame_paths]
imgs = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in imgs]
imgs = [
cv2.resize(img, self.img_size) for img in imgs if img.shape != self.img_size
]
batch_imgs = np.asarray(imgs)
labels = list(map(int, frame_label))
y = np.array(labels)
self.batch_labels.extend(labels)
self.batch_names.extend([str(frame).split("\")[-1] for frame in frame_paths])
return (
batch_imgs,y
)
def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.df))
if self.shuffle == True:
np.random.shuffle(self.indexes)
请注意,这不是错误,而是一条日志消息:https://github.com/tensorflow/tensorflow/blob/42b5da6659a75bfac77fa81e7242ddb5be1a576a/tensorflow/core/kernels/data/shuffle_dataset_op.cc#L138
如果花费的时间太长,您可能选择了太大的数据集:https://github.com/tensorflow/tensorflow/issues/30646
您可以通过降低缓冲区大小来解决此问题:https://support.huawei.com/enterprise/en/doc/EDOC1100164821/2610406b/what-do-i-do-if-training-times-out-due-to-too-many-dataset-shuffle-operations
我有一个数据集,其中包括部分视频帧,1000 个真实视频和 1000 个深度假视频。预处理阶段后的每个视频都转换为其他世界的 300 帧我有一个数据集,其中包含 300000 个带有 Real(0) 标签的图像和 300000 个带有 Fake(1) 标签的图像。 我想用这些数据训练 MesoNet。我使用 costum DataGenerator class 以 0.8,0.1,0.1 的比率处理训练、验证、测试数据,但是当我 运行 项目显示此消息时:
Filling up shuffle buffer (this may take a while):
我该怎么做才能解决这个问题?
您可以在下面看到数据生成器class。
class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, df, labels, batch_size =32, img_size = (224,224),
n_classes = 2, shuffle=True):
'Initialization'
self.batch_size = batch_size
self.labels = labels
self.df = df
self.img_size = img_size
self.n_classes = n_classes
self.shuffle = shuffle
self.batch_labels = []
self.batch_names = []
self.on_epoch_end()
def __len__(self):
'Denotes the number of batches per epoch'
return int(np.floor(len(self.df) / self.batch_size))
def __getitem__(self, index):
batch_index = self.indexes[index * self.batch_size : (index + 1) * self.batch_size]
frame_paths = self.df.iloc[batch_index]["framePath"].values
frame_label = self.df.iloc[batch_index]["label"].values
imgs = [cv2.imread(frame) for frame in frame_paths]
imgs = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in imgs]
imgs = [
cv2.resize(img, self.img_size) for img in imgs if img.shape != self.img_size
]
batch_imgs = np.asarray(imgs)
labels = list(map(int, frame_label))
y = np.array(labels)
self.batch_labels.extend(labels)
self.batch_names.extend([str(frame).split("\")[-1] for frame in frame_paths])
return (
batch_imgs,y
)
def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.df))
if self.shuffle == True:
np.random.shuffle(self.indexes)
请注意,这不是错误,而是一条日志消息:https://github.com/tensorflow/tensorflow/blob/42b5da6659a75bfac77fa81e7242ddb5be1a576a/tensorflow/core/kernels/data/shuffle_dataset_op.cc#L138
如果花费的时间太长,您可能选择了太大的数据集:https://github.com/tensorflow/tensorflow/issues/30646
您可以通过降低缓冲区大小来解决此问题:https://support.huawei.com/enterprise/en/doc/EDOC1100164821/2610406b/what-do-i-do-if-training-times-out-due-to-too-many-dataset-shuffle-operations