如何在 Keras 上使用 Sequence API 应用数据扩充？

Question

我在网上找到这个序列API（不记得在哪里了，不好意思）：

class PlanetSequence(tf.keras.utils.Sequence):
    """
    Custom Sequence object to train a model on out-of-memory datasets. 
    """

    def __init__(self, df_path, data_path, im_size, batch_size, mode='train'):
        """
        df_path: path to a .csv file that contains columns with image names and labels
        data_path: path that contains the training images
        im_size: image size
        mode: when in training mode, data will be shuffled between epochs
        """
        self.df = pd.read_csv(df_path)
        self.im_size = im_size
        self.batch_size = batch_size
        self.mode = mode

        # Take labels and a list of image locations in memory
        labelsEncoder = self.df['label'].values
        self.labels = to_categorical(labelsEncoder, num_classes=11)
        self.image_list = self.df['image'].apply(lambda x: os.path.join(data_path, x)).tolist()

    def __len__(self):
        return int(math.ceil(len(self.df) / float(self.batch_size)))

    def on_epoch_end(self):
        # Shuffles indexes after each epoch
        self.indexes = range(len(self.image_list))
        if self.mode == 'train':
            self.indexes = random.sample(self.indexes, k=len(self.indexes))

    def get_batch_labels(self, idx): 
        # Fetch a batch of labels
        return self.labels[idx * self.batch_size : (idx + 1) * self.batch_size]

    def get_batch_features(self, idx):
        # Fetch a batch of images
        batch_images = self.image_list[idx * self.batch_size : (1 + idx) * self.batch_size]
        return np.array([load_image(im, self.im_size) for im in batch_images])

    def __getitem__(self, idx):
        batch_x = self.get_batch_features(idx)
        batch_y = self.get_batch_labels(idx)
        return batch_x, batch_y

并且在 load_image 函数中，我们有这个：

def load_image(image_path, size):
    # data augmentation logic such as random rotations can be added here
    return img_to_array(load_img(image_path, target_size=(size, size))) / 255.

似乎我可以在那里使用数据增强，但我不知道如何使用。

我想过使用 Keras 的 DataImageGenerator 并使用流来增强图像，但我无法做到这一点。

处理它的最佳方法是什么？

Answer 1

我对答案做了很多修改。我也会尝试在您的代码中安装数据生成器，虽然我建议采用这种方式来使用具有基本数据管理功能的图像生成器。
首先阅读 train csv 和 import shutil util 来移动你并按照下面提到的结构对齐你的文件夹： import shutil 使用这种方式快速读取 csv 的每一行并根据结构将图像复制到各自的文件夹中 shutil.copy(path given in csv, <destination folder>) 通过这种方式读取两个 csvs 并使用 shutil 将图像移动到下面提到的层次结构中，相信我将花费更少的时间来保存数据。您可以在训练和测试文件夹中创建多个子文件夹（取决于 class）。

|__ 火车
|______ 星球：[包含图像。]
|______ 星级：[包含图片。]

|__ 测试
|______ 行星：[包含图片]
|______ 狗：[图片]

test_dir = os.path.join(PATH, 'test')
train_dir = os.path.join(PATH, 'train')

train_planets_dir = os.path.join(train_dir, 'planet')  # directory with our planets images
train_stars_dir = os.path.join(train_dir, 'star')  # directory with our training star images
# similarly for stars i.e. other class
test_planets_dir = os.path.join(test_dir, 'planet')  
test_stars_dir = os.path.join(test_dir, 'star')

现在根据需要使用您需要的所有增强类型调用图像生成器（查看不同增强的参数，启用所有需要的增强）

train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
                                                           directory=train_dir,
                                                           shuffle=True,
                                                           target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                           class_mode='binary')


see **train_dir** is the common path which contains all the sub class folders within

similarly for test.
test_data_gen = test_image_generator.flow_from_directory(batch_size=batch_size,
                                                              directory=test_dir,
                                                              target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                              class_mode='binary')

这样，您将获得正确的数据保存方式，并可以有效地使用数据生成器，而且这种方式最终将自动处理标签。

希望对您有所帮助。

如何在 Keras 上使用 Sequence API 应用数据扩充？

How can I apply data augmentation using Sequence API on Keras?

keras

data-augmentation