如何在实际数据集上进行图像增强，这样我就不需要为每个增强图像添加标签

Question

我想对包含图像的数据集进行增强，该图像作为 np 数组存储在 X_train 中，其标签存储在 y_train 中。形状如下：

print(X_train.shape)
print(y_train.shape)

输出：

(1100, 22, 64, 64)
(1100,)

一张图片看起来像这样

plt.imshow(X_train[0][0])

我如何扩充这个数据集，这样我就不需要每次都添加它的标签了？

Answer 1

一种选择是使用生成器：

def get_augmented_sample(X_train, y_train):
  for x, y in zip(X_train, y_train): 
    # data augmentation to x, e.g. adding some noise
    x_augmented = x + np.random.normal(0, 20, x.shape)
    yield x_augmented, y

data_generator = get_augmented_sample(X_train, y_train)

# get an augmented sample 
x, y = next(data_generator)

# original
plt.imshow(X_train[0][0])

# augmented
plt.imshow(x[0])

如何在实际数据集上进行图像增强，这样我就不需要为每个增强图像添加标签

How to do image augmentation on the actual dataset, so that I don't need to add label for every augmented image

conv-neural-network

tensorflow

image-augmentation