语义图像分割 NN (DeepLabV3+) 内存过多问题

Question

我首先解释一下我的任务：我有近 3000 张来自两条不同绳索的图像。它们包含绳索 1、绳索 2 和背景。我的Labels/Masks是图片，比如像素值0代表背景，1代表第一根绳子，2代表第二根绳子。您可以在下面的图 1 和图 2 中看到输入图片和地面 truth/labels。请注意，我的地面 truth/label 只有 3 个值：0、1 和 2。我的输入图片是灰色的，但对于 DeepLab，我将其转换为 RGB 图片，因为 DeepLab 是在 RGB 图片上训练的。但是我转换后的图片还是没有颜色。

这个任务的想法是神经网络应该从绳索中学习结构，所以即使有绳结它也能正确地标记绳索。因此颜色信息并不重要，因为我的绳子有不同的颜色，所以很容易使用 KMeans 创建地面 truth/labels.

对于这个任务，我在 Keras 中选择了一个名为 DeepLab V3+ 的语义分割网络，以 TensorFlow 作为后端。我想用我近 3000 张图像训练 NN。所有图像的大小都在 100MB 以下，并且它们是 300x200 像素。也许 DeepLab 不是我任务的最佳选择，因为我的图片不包含颜色信息并且我的图片尺寸非常小 (300x200)，但到目前为止我没有找到更好的语义分割 NN 来完成我的任务。

从 Keras 网站我知道如何使用 flow_from_directory 加载数据以及如何使用 fit_generator 方法。不知道我的代码逻辑是否正确...

链接如下：

https://keras.io/preprocessing/image/

https://keras.io/models/model/

https://github.com/bonlime/keras-deeplab-v3-plus

我的第一个问题是：

在我的实施过程中，我的显卡几乎使用了所有内存 (11GB)。我不知道为什么。 DeepLab 的权重有可能那么大吗？我的 Batchsize 默认是 32，我所有的近 300 张图片都在 100MB 以下。我已经使用了 config.gpu_options.allow_growth = True 代码，请参阅下面的代码。

一般性问题：

有人知道适合我任务的语义分割 NN 吗？我不需要用彩色图像训练的神经网络。但我也不需要神经网络，它是用二进制地面真实图片训练的…… 我用 DeepLab 测试了我的原始彩色图像（图 3），但我得到的结果标签并不好...

到目前为止，这是我的代码：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3"

import numpy as np
from model import Deeplabv3
import tensorflow as tf
import time
import tensorboard
import keras
from keras.preprocessing.image import img_to_array
from keras.applications import imagenet_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard


config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

from keras import backend as K
K.set_session(session)

NAME = "DeepLab-{}".format(int(time.time()))

deeplab_model = Deeplabv3(input_shape=(300,200,3), classes=3)

tensorboard = TensorBoard(log_dir="logpath/{}".format(NAME))

deeplab_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])

# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
                     featurewise_std_normalization=True,
                     rotation_range=90,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     zoom_range=0.2)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
#image_datagen.fit(images, augment=True, seed=seed)
#mask_datagen.fit(masks, augment=True, seed=seed)

image_generator = image_datagen.flow_from_directory(
    '/path/Input/',
    target_size=(300,200),
    class_mode=None,
    seed=seed)

mask_generator = mask_datagen.flow_from_directory(
    '/path/Label/',
    target_size=(300,200),
    class_mode=None,
    seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

print("compiled")

#deeplab_model.fit(X, y, batch_size=32, epochs=10, validation_split=0.3, callbacks=[tensorboard])
deeplab_model.fit_generator(train_generator, steps_per_epoch= np.uint32(2935 / 32), epochs=10, callbacks=[tensorboard])

print("finish fit")
deeplab_model.save_weights('deeplab_1.h5')
deeplab_model.save('deeplab-1')

session.close()

这是我测试 DeepLab 的代码（来自 Github）：

from matplotlib import pyplot as plt
import cv2 # used for resize. if you dont have it, use anything else
import numpy as np
from model import Deeplabv3
import tensorflow as tf
from PIL import Image, ImageEnhance

deeplab_model = Deeplabv3(input_shape=(512,512,3), classes=3)
#deeplab_model = Deeplabv3()
img = Image.open("Path/Input/0/0001.png")
imResize = img.resize((512,512), Image.ANTIALIAS)
imResize = np.array(imResize)
img2 = cv2.cvtColor(imResize, cv2.COLOR_GRAY2RGB)

w, h, _ = img2.shape
ratio = 512. / np.max([w,h])
resized = cv2.resize(img2,(int(ratio*h),int(ratio*w)))
resized = resized / 127.5 - 1.
pad_x = int(512 - resized.shape[0])
resized2 = np.pad(resized,((0,pad_x),(0,0),(0,0)),mode='constant')
res = deeplab_model.predict(np.expand_dims(resized2,0))
labels = np.argmax(res.squeeze(),-1)
plt.imshow(labels[:-pad_x])
plt.show()

Answer 1

第一个问题：DeepLabV3+ 是一个非常大的模型（我假设您使用的是 Xception backbone？！）并且 11 GB 的所需 GPU 容量对于 32 像素和 200x300 像素的 bachsize 是完全正常的：）（训练 DeeplabV3+，我需要大约 11 GB，使用 5 的批量大小和 500x500 像素）。问题第二句的一个注释：所需的 GPU 资源受许多因素（模型、优化器、批量大小、图像裁剪、预处理等）的影响，但数据集集的实际大小不应影响它。因此，无论您的数据集是 300MB 还是 300GB 都没有关系。

一般问题：您使用的是小型数据集。选择 DeeplabV3+ 和 Xception 可能不太合适，因为模型可能太大。这可能会导致过度拟合。如果您尚未获得令人满意的结果，您可以尝试使用较小的网络。如果你想坚持使用 DeepLab 框架，你可以将 backbone 从 Xception 网络切换到 MobileNetV2（在官方 tensorflow 版本中它已经实现）。或者，您可以尝试使用独立网络，例如带有 FCN 头的 Inception 网络...

在每种情况下，都必须使用具有训练有素的特征表示的预训练编码器。如果您找不到基于灰度输入图像的所需模型的良好初始化，只需使用在 RGB 图像上预训练的模型并使用灰度数据集扩展预训练（基本上您可以将任何大的 rgb 数据集转换为灰度）并在使用数据之前微调灰度输入的权重。

希望对您有所帮助！干杯，弗兰克

Answer 2

IBM 的大型模型支持 (LMS) 库支持训练通常会在训练时耗尽 GPU 内存的大型深度神经网络。 LMS 通过在不需要张量时临时将张量交换到主机内存来管理 over-subscription GPU 内存。

说明 - https://developer.ibm.com/components/ibm-power/articles/deeplabv3-image-segmentation-with-pytorch-lms/

火炬 - https://github.com/IBM/pytorch-large-model-support

张量流 - https://github.com/IBM/tensorflow-large-model-support

语义图像分割 NN (DeepLabV3+) 内存过多问题

Too Much Memory Issue with Semantic Image Segmentation NN (DeepLabV3+)

computer-vision

keras

tensorflow

semantic-segmentation

deeplab