如何在实际数据集上进行图像增强,这样我就不需要为每个增强图像添加标签

How to do image augmentation on the actual dataset, so that I don't need to add label for every augmented image

我想对包含图像的数据集进行增强,该图像作为 np 数组存储在 X_train 中,其标签存储在 y_train 中。 形状如下:

print(X_train.shape)
print(y_train.shape)

输出:

(1100, 22, 64, 64)
(1100,)

一张图片看起来像这样

plt.imshow(X_train[0][0])

我如何扩充这个数据集,这样我就不需要每次都添加它的标签了?

一种选择是使用生成器:

def get_augmented_sample(X_train, y_train):
  for x, y in zip(X_train, y_train): 
    # data augmentation to x, e.g. adding some noise
    x_augmented = x + np.random.normal(0, 20, x.shape)
    yield x_augmented, y

data_generator = get_augmented_sample(X_train, y_train)

# get an augmented sample 
x, y = next(data_generator)
# original
plt.imshow(X_train[0][0])

# augmented
plt.imshow(x[0])