图像预处理和数据增强应该如何进行语义分割？

Question

我有一个不平衡的小型数据集，其中包含 4116 张 224x224x3 (RGB) 航拍图像。由于数据集不够大，我很可能会遇到过拟合问题。图像预处理和数据增强有助于解决这个问题，如下所述。

"Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images."

Deep Learning with Python by François Chollet, page 138-139, 5.2.5 Using data augmentation.

我已阅读 Medium - Image Data Preprocessing for Neural Networks and examined Stanford's CS230 - Data Preprocessing 并且 CS231 - Data Preprocessing courses. It is highlighted once more in 我知道没有“一刀切”的解决方案。这是迫使我问这个问题的原因：

"No translation augmentation was used since we want to achieve high spatial resolution."

Reference: Researchgate - Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks

我知道我将使用 Keras - ImageDataGenerator Class，但不知道在小对象语义分割任务中使用哪些技术和哪些参数。有人可以启发我吗？提前致谢。 :)

from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=20,      # is a value in degrees (0–180)
    width_shift_range=0.2,  # is a range within which to randomly translate pictures horizontally.
    height_shift_range=0.2, # is a range within which to randomly translate pictures vertically.
    shear_range=0.2,        # is for randomly applying shearing transformations.
    zoom_range=0.2,         # is for randomly zooming inside pictures.
    horizontal_flip=True,   # is for randomly flipping half the images horizontally
    fill_mode='nearest',    # is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift
    featurewise_center=True,
    featurewise_std_normalization=True)

datagen.fit(X_train)

Answer 1

增强和预处理阶段始终取决于您遇到的问题。您必须考虑可以扩大数据集的所有可能的扩充。但最重要的是，你不应该进行极端的增强，这会以在真实示例中不会发生的方式生成新的训练样本。如果您不希望真实示例水平翻转，请不要执行水平翻转，因为这会给您的模型提供错误信息。想一想输入图像中可能发生的所有可能变化，并尝试从现有图像中人工生成新图像。您可以使用 Keras 中的许多 built-in 函数。但是您应该注意每一个，它不会产生新的示例，这些示例不太可能出现在您的模型的输入中。

正如您所说，没有 "one fits all" 解决方案，因为一切都取决于数据。分析数据并根据数据构建所有内容。

关于小物体 - 你应该检查的一个方向是损失函数，它强调目标体积与背景相比的影响。查看骰子损失或广义骰子损失。

图像预处理和数据增强应该如何进行语义分割？

How should image preprocessing and data augmentation be for semantic segmentation?

python

keras

image-preprocessing

semantic-segmentation