用于分割的 Pytorch VNet 最终 softmax 激活层。标签的不同通道尺寸。如何获得预测输出?

Pytorch VNet final softmax activation layer for segmentation. Different channel dimensions to labels. How do I get prediction output?

我正在尝试构建一个 V-Net。当我在训练期间将图像传递给分割时,输出在 softmax 激活后有 2 个通道(如所附图像中的体系结构中指定)但标签和输入有 1 个。我如何转换它以使输出为分割图像?训练时我是否只将其中一个通道作为最终输出(例如 output = output[:, 0, :, :, :])而另一个通道将作为背景?

outputs = network(inputs)

batch_size = 32
outputs.shape: [32, 2, 64, 128, 128]
inputs.shape: [32, 1, 64, 128, 128]
labels.shape: [32, 1, 64, 128, 128]

这是我的 Vnet 转发:

def forward(self, x):
    # Initial input transition
    out = self.in_tr(x)

    # Downward transitions
    out, residual_0 = self.down_depth0(out)
    out, residual_1 = self.down_depth1(out)
    out, residual_2 = self.down_depth2(out)
    out, residual_3 = self.down_depth3(out)

    # Bottom layer
    out = self.up_depth4(out)

    # Upward transitions
    out = self.up_depth3(out, residual_3)        
    out = self.up_depth2(out, residual_2)
    out = self.up_depth1(out, residual_1)
    out = self.up_depth0(out, residual_0)

    # Pass to convert to 2 channels
    out = self.final_conv(out)
    
    # return softmax 
    out = F.softmax(out)
    
    return out [batch_size, 2, 64, 128, 128]

V Net architecture as described in (https://arxiv.org/pdf/1606.04797.pdf)

那篇论文有两个输出,因为他们预测了两个 类:

The network predictions, which consist of two volumes having the same resolution as the original input data, are processed through a soft-max layer which outputs the probability of each voxel to belong to foreground and to background.

因此,这不是一个自动编码器,您的输入将作为输出通过模型传回。他们使用一组标签来区分他们感兴趣的像素(前景)和其他像素(背景)。如果您希望以这种方式使用 V-net,则需要更改您的数据。

它不会像指定一个通道作为输出那么简单,因为这将是一个分类任务而不是回归任务。您将需要带注释的标签才能使用此模型架构。