计算机视觉中图像分割任务的标签到底是什么

What exactly is label for image segmentation task in computer vision

我最近一直在做一些图像分割任务,想从头开始应用一个。

据我所知,分割是每个像素预测它所属的位置 - 对象实例(事物),背景片段实例(东西)。

根据最新算法 Mask RCNN 所基于的 COCO 数据集:

things are countable objects such as people, animals, tools. Stuff classes are amorphous regions of similar texture or material such as grass, sky, road.

根据 Mask Rcnn 论文,最终的 class 化是二元交叉熵损失函数,每个像素采用 sigmoid(以避免内部 class 竞争)。该管道基于 FRCNN 对象检测管道的顶部,它从那里获取感兴趣区域 (roi) 并通过 ROI-align class 传递它们以保持空间信息完整。

让我感到困惑的是以下内容。下面给出一个非常简单的代码片段,用于将二元交叉熵损失应用于分离的 3 个完全连接的层(一些带有尺度的随机实验):

class ModelMain(nn.Module):
    def __init__(self, config, is_training=True):
        super(ModelMain, self).__init__()
        self.fc_1 = torch.nn.Linear(incoming_size_1, outgoing_size_1)
        self.fc_2 = torch.nn.Linear(incoming_size_2, outgoing_size_2)
        self.fc_3 = torch.nn.Linear(incoming_size_3, outgoing_size_3)

    def forward(self, x):
        y_1 = F.sigmoid(self.fc_1(x)) 
        y_2 = F.sigmoid(self.fc_2(x)) 
        y_3 = F.sigmoid(self.fc_3(x)) 

        return y_1, y_2, y_3


model = ModelMain()
criterion = torch.nn.BCELoss(size_average = True) 
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

def run_epoch():
    batchsize = 10
    for epoch in range(batchsize):
        # Find image segment predicted by running forward pass: 
        y_predicted_1, y_predicted_2, y_predicted_3  = model(batch_data_x)

        # Compute and print loss : 
        loss_1 = criterion(y_predicted_1, batch_data_y)
        loss_2 = criterion(y_predicted_2, batch_data_y)
        loss_3 = criterion(y_predicted_3, batch_data_y)

        print( "Epoch ", epoch, "Loss : ", loss_1, loss_2, loss_3)

        # Perform Backward pass : 
        optimizer.zero_grad()
        loss_1.backward()
        loss_2.backward()
        loss_3.backward()
        optimizer.step()

...我们在这里提供什么作为标签?

来自数据集:

Formatted JSON Data

图片:

 {
       "license":2,
       "file_name":"000000000139.jpg",
       "coco_url":"http://images.cocodataset.org/val2017/000000000139.jpg",
       "height":426,
       "width":640,
       "date_captured":"2013-11-21 01:34:01",
       "flickr_url":"http://farm9.staticflickr.com/8035/8024364858_9c41dc1666_z.jpg",
       "id":139
    }

段信息:

{
   "segments_info":[
      {
         "id":3226956,
         "category_id":1,
         "iscrowd":0,
         "bbox":[
            413,
            158,
            53,
            138
         ],
         "area":2840
      },
      {
         "id":6979964,
         "category_id":1,
         "iscrowd":0,
         "bbox":[
            384,
            172,
            16,
            36
         ],
         "area":439
      },
      {
         "id":3103374,
         "category_id":62,
         "iscrowd":0,
         "bbox":[
            413,
            223,
            30,
            81
         ],
         "area":1250
      },
      {
         "id":2831194,
         "category_id":62,
         "iscrowd":0,
         "bbox":[
            291,
            218,
            62,
            98
         ],
         "area":1848
      },
      {
         "id":3496593,
         "category_id":62,
         "iscrowd":0,
         "bbox":[
            412,
            219,
            10,
            13
         ],
         "area":90
      },
      {
         "id":2633066,
         "category_id":62,
         "iscrowd":0,
         "bbox":[
            317,
            219,
            22,
            12
         ],
         "area":212
      },
      {
         "id":3165572,
         "category_id":62,
         "iscrowd":0,
         "bbox":[
            359,
            218,
            56,
            103
         ],
         "area":2251
      },
      {
         "id":8824489,
         "category_id":64,
         "iscrowd":0,
         "bbox":[
            237,
            149,
            24,
            62
         ],
         "area":369
      },
      {
         "id":3032951,
         "category_id":67,
         "iscrowd":0,
         "bbox":[
            321,
            231,
            126,
            89
         ],
         "area":2134
      },
      {
         "id":2038814,
         "category_id":72,
         "iscrowd":0,
         "bbox":[
            7,
            168,
            149,
            95
         ],
         "area":13247
      },
      {
         "id":3289671,
         "category_id":72,
         "iscrowd":0,
         "bbox":[
            557,
            209,
            82,
            79
         ],
         "area":5846
      },
      {
         "id":2437710,
         "category_id":78,
         "iscrowd":0,
         "bbox":[
            512,
            206,
            15,
            16
         ],
         "area":224
      },
      {
         "id":4159376,
         "category_id":82,
         "iscrowd":0,
         "bbox":[
            493,
            174,
            20,
            108
         ],
         "area":2056
      },
      {
         "id":3423599,
         "category_id":84,
         "iscrowd":0,
         "bbox":[
            613,
            308,
            13,
            46
         ],
         "area":324
      },
      {
         "id":3094634,
         "category_id":84,
         "iscrowd":0,
         "bbox":[
            605,
            306,
            14,
            45
         ],
         "area":331
      },
      {
         "id":3296100,
         "category_id":85,
         "iscrowd":0,
         "bbox":[
            448,
            121,
            14,
            22
         ],
         "area":227
      },
      {
         "id":6054280,
         "category_id":86,
         "iscrowd":0,
         "bbox":[
            241,
            195,
            14,
            18
         ],
         "area":187
      },
      {
         "id":5942189,
         "category_id":86,
         "iscrowd":0,
         "bbox":[
            549,
            309,
            36,
            90
         ],
         "area":2171
      },
      {
         "id":4086154,
         "category_id":86,
         "iscrowd":0,
         "bbox":[
            351,
            209,
            11,
            22
         ],
         "area":178
      },
      {
         "id":7438777,
         "category_id":86,
         "iscrowd":0,
         "bbox":[
            337,
            200,
            10,
            16
         ],
         "area":120
      },
      {
         "id":3031159,
         "category_id":118,
         "iscrowd":0,
         "bbox":[
            0,
            269,
            564,
            157
         ],
         "area":49754
      },
      {
         "id":9284267,
         "category_id":119,
         "iscrowd":0,
         "bbox":[
            338,
            166,
            29,
            50
         ],
         "area":842
      },
      {
         "id":6068135,
         "category_id":130,
         "iscrowd":0,
         "bbox":[
            212,
            11,
            321,
            127
         ],
         "area":3391
      },
      {
         "id":2567230,
         "category_id":156,
         "iscrowd":0,
         "bbox":[
            129,
            168,
            351,
            162
         ],
         "area":5699
      },
      {
         "id":10334639,
         "category_id":181,
         "iscrowd":0,
         "bbox":[
            204,
            63,
            234,
            174
         ],
         "area":15587
      },
      {
         "id":6266027,
         "category_id":186,
         "iscrowd":0,
         "bbox":[
            136,
            0,
            473,
            116
         ],
         "area":20106
      },
      {
         "id":5274512,
         "category_id":188,
         "iscrowd":0,
         "bbox":[
            0,
            38,
            549,
            297
         ],
         "area":25483
      },
      {
         "id":7238567,
         "category_id":189,
         "iscrowd":0,
         "bbox":[
            457,
            350,
            183,
            76
         ],
         "area":9421
      },
      {
         "id":4224910,
         "category_id":199,
         "iscrowd":0,
         "bbox":[
            0,
            0,
            640,
            358
         ],
         "area":83201
      },
      {
         "id":6391959,
         "category_id":200,
         "iscrowd":0,
         "bbox":[
            135,
            359,
            336,
            67
         ],
         "area":12618
      }
   ],
   "file_name":"000000000139.png",
   "image_id":139
}

面具图像:

原始图像:

对于目标检测任务,我们有边界框,但对于图像分割,我需要使用提供的掩码计算损失。 那么上面代码中 batch_data_y 的值应该是多少。 它会是蒙版图像的矢量吗?但这不会训练我的网络关于某些段是什么颜色吗?或者我是否遗漏了其他一些细分注释?

正如@hkchengrex 在他的评论中提到的,蒙版图像中的颜色似乎是从真实图像中挑选出来的这一事实要么是巧合,要么是某些 post 可视化处理的结果。

语义掩码通常represented/stored作为图像,每个像素的值代表实际图片中的class(es)。例如,假设您正在考虑 C classes,图片 I 的语义掩码 M 可以表示为图像,其中 M(i,j) = c 表示像素 I(i,j) 应归类为属于语义 class cc in [0; C[i in [0, H[j[0, W[ 中,(H, W) 个维度 I)。

现在,由于 classes 彼此独立,网络预测它们的最佳方法是输出形状为 (H, W, C) 的概率图 P,其中 P(i,j,c) 表示 I(i,j) 属于 class c 的估计概率(在 01 之间,因此激活函数像 sigmoid)。 =45=]

正如您详细介绍的那样,有了这样的输出,您可以使用二进制交叉熵来训练您的网络——假设您预处理了您的 grount-truth 掩码 M,将它们从HxW 值在 [0,C] 中的图像 (logits) 到 HxWxC 值在 [0,1] 中的地图。这种预处理称为 "one-hot conversion",可以使用 Pytorch 使用 scatter() c.f. this thread:

完成
import torch

M_onehot = torch.cuda.FloatTensor(C, H, W)
M_onehot.zero_()
M_onehot.scatter_(1, M, 1)

但是,另一种解决方案——可能不太适合您的问题(如果您想避免 softmax,因为它包括此操作)——是使用(非二进制)交叉熵损失。 torch.nn.CrossEntropyLoss() 会直接将 P(形状 (H, W, C))作为预测,M(形状 (H, W))作为目标。

@Aldream 的直觉是正确的,但他们明确针对 coco 数据集提供了二进制掩码,他们网站上的文档不是很好:

Interface for manipulating masks stored in RLE format.

RLE is a simple yet efficient format for storing binary masks. RLE first divides a vector (or vectorized image) into a series of piecewise constant regions and then for each piece simply stores the length of that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] (note that the odd counts are always the numbers of zeros). Instead of storing the counts directly, additional compression is achieved with a variable bitrate representation based on a common scheme called LEB128. source : link

尽管我确实为平均二元交叉熵损失编写了自己的自定义函数:

def l_cross_entropy2d(input, target, weight=None, size_average=True):

    n, c, h, w = input.size()
    nt, ct, ht, wt = target.size()

    # Handle inconsistent size between input and target
    if h > ht and w > wt: # upsample labels
        target = target.unsqueeze(1)
        target = F.upsample(target, size=(h, w), mode='nearest')
        target = target.sequeeze(1)
    elif h < ht and w < wt: # upsample images
        input = F.upsample(input, size=(ht, wt), mode='bilinear')
    elif h != ht and w != wt:
        raise Exception("Only support upsampling")

    # take per pixel sigmoid  
    sigm = F.sigmoid(input)
    # change dimension to create 2d matrix where rows -> pixels and columns -> classes
    # takes input tensor <n X c X h X w> outputs tensor < n*h*w X c >
    sigm = sigm.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)

    # change target to column tensor for calculating cross entropy and repeat it number of classes times
    # Get all values from sigmoid tensor >= 0 (all pixels that have value) 
    sigm = sigm[target.view(-1, 1).repeat(1, c) >= 0]
    sigm = sigm.view(-1, c)

    mask = target >= 0
    target = target[mask]
    loss = F.nll_loss(sigm, target, ignore_index=250,
                      weight=weight, size_average=False)
    if size_average:
        loss /= mask.data.sum()
    return loss