model.predict() 函数进行语义分割的结果是什么意思？

Question

我使用 Segmentation Models 库进行多class（在我的例子中是 4 class）语义分割。该模型（具有 'resnet34' backbone 的 UNet）使用 3000 张 RGB (224x224x3) 图像进行训练。准确率约为 92.80%。

1) 为什么 model.predict() 函数需要 (1,224,224,3) 形数组作为输入？我什至在 Keras documentation 中也没有找到答案。实际上，下面的代码是有效的，我没有问题，但我想了解原因。

predictions = model.predict( test_image.reshape(-1,224,224,3) );

2) predictions 是一个 (1,224,224,3) 形的 numpy 数组。它的数据类型是 float32 并且包含一些浮点数。这个数组中的数字是什么意思？我怎样才能形象化它们？我的意思是，我假设结果数组将包含每个像素的 4 个 class 标签（从 0 到 3）之一，然后我将为每个 class 应用颜色映射。也就是说，结果应该是预测图，但是我没有得到。要更好地理解我所说的预测图，请访问 Jeremy Jordan's blog about semantic segmentation.

result = predictions[0]
plt.imshow(result)  # import matplotlib.pyplot as plt

3) 我最终想做的就像 Github: mrgloom - Semantic Segmentation Categorical Crossentropy Example 在 visualy_inspect_result 函数中做的那样。

Answer 1

1) 你的深度神经网络架构中的图像输入形状是 (224,224,3)，所以 width=height=224 和 3 个颜色通道。如果您想一次为模型提供多个图像，则需要一个额外的维度。所以 (1,224,224,3) 或 (something, 224,224,3).

2) 根据 Segementation models repo 的文档，您可以指定要作为输出 model = Unet('resnet34', classes=4, activation='softmax') 的 classes 的数量。因此，如果您将标记图像重塑为形状 (1,224,224,4)。如果像素 i,j 属于 class k，最后一个维度是一个掩码通道，用 0 或 1 表示。然后你可以预测和访问每个输出掩码

masked = model.predict(np.array([im])[0]
mask_class0 = masked[:,:,0]
mask_class1 = masked[:,:,1]

3) 然后使用 matplotlib 您将能够绘制语义分割或使用 scikit-image : color.label2rgb function

model.predict() 函数进行语义分割的结果是什么意思？

What is the meaning of the result of model.predict() function for semantic segmentation?

python

predict

keras

semantic-segmentation