使用 Caffe 进行多 class 和多标签图像 class 化

Multi-class and multi-label image classification using Caffe

我正在尝试在 caffe 中创建单个多class 和多标签网络配置。

假设class狗的化:狗是小的还是大的? (class) 是什么颜色的? (class) 有领子吗? (标签)

这东西可以用caffe吗? 这样做的正确方法是什么?

只是想了解实用的方法.. 创建包含所有图像标签的 2 个 .text 文件(一个用于训练,一个用于验证)后,例如:

/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17

运行 py脚本:

import h5py, os
import caffe
import numpy as np

SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

并创建 train.h5 和 val.h5(X 数据集包含图像,Y 数据集包含标签吗?)。

替换我的网络输入层:

layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db" 
   backend: LMDB 
   batch_size: 64 
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TRAIN } 
} 
layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"  
   backend: LMDB 
   batch_size: 64
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TEST } 
} 

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TEST }
}

我猜 HDF5 不需要 mean.binaryproto?

接下来,输出层应该如何变化才能输出多个标签概率? 我想我需要交叉熵层而不是 softmax? 这是当前的输出层:

layers {
  bottom: "prob"
  bottom: "label"
  top: "loss"
  name: "loss"
  type: SOFTMAX_LOSS
  loss_weight: 1
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}

均值减法

虽然 lmdb 输入数据层能够为您处理各种输入转换,但 "HDF5Data" 层不支持此功能。
因此,您在创建 hdf5 文件时必须处理所有输入转换(尤其是均值减法)。
查看代码的位置

# you may apply other input transformations here...

多个标签

虽然您的 .txt 为每张图片列出了多个标签,但您只将第一个保存到 hdf5 文件。如果您想使用这些标签,您必须将它们提供给网络。
您的示例中立即出现的一个问题是您没有为每个训练图像设置固定数量的标签——为什么?这是什么意思?
假设每个图像(在 .txt 文件中)有三个标签:

< filename > < dog size > < dog color > < has collar >

然后您可以在您的 hdf5 中包含 y_sizey_colory_collar(而不是单个 y)。

y_size[i] = float(spl[1])
y_color[i] = float(spl[2])
y_collar[i] = float(spl[3])

您的输入数据层相应地会有更多 "top"s:

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y_size"
  top: "y_color"
  top: "y_collar"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

预测

目前您的网络只能预测一个标签(top: "prob" 层)。您需要您的网络来预测所有三个标签,因此您需要添加计算 top: "prob_size"top: "prob_color"top: "prob_collar" 的层(每个 "prob_*" 的不同层)。
一旦你对每个标签都有预测,你就需要一个损失(同样,每个标签都有一个损失)。