Caffe：一次预测多个标签（Python）

Question

在caffe中，我希望能够一次预测多个标签，比如键盘方向键：可以同时按下两个键。我正在尝试使用卷积神经网络在 TM Nation Forever 游戏中驾驶虚拟 F1 汽车，我计划很快收集和塑造训练数据，我想知道我做的是否正确。

我认为这个 post 会给出一个很好的例子来说明如何在 Python 中进行这种 class 化，但我还没有找到任何令人满意的例子来说明如何做到这一点。

有人可以验证这种在神经网络中收集和表示数据的方式是否会像我预期的那样工作：

HDF5python代码

comp_kwargs = {'compression': 'gzip', 'compression_opts': 1}

with h5py.File(train_filename, 'w') as f:
    f.create_dataset('data_img', data=X, **comp_kwargs)
    f.create_dataset('data_speed', data=S.astype(np.float_), **comp_kwargs)

    f.create_dataset('label_forward', data=f.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_backward', data=b.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_left', data=l.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_right', data=r.astype(np.int_), **comp_kwargs)

with open(train_filename_list_txt, 'w') as f:
    f.write(train_filename + '\n')

关于 HDF5 数据形状的信息

输入：

data_img: 
-> number N x channel K x height H x width W

data_speed:
-> number N  x  1 float number (from 0.0 to 1.0)

输出：

注意：我使用 numpy 的 "int_" 来获取标签 class 以 class 化。

label_forward:
-> number N  x  1 integer number (0 or 1)

label_backward:
-> number N  x  1 integer number (0 or 1)

label_left:
-> number N  x  1 integer number (0 or 1)

label_right:
-> number N  x  1 integer number (0 or 1)

卷积神经网络架构

我在这里发表了一些半相关的评论，如果您对网络架构有任何意见，如果它可以提高性能，我将不胜感激:)

import numpy as np

import caffe
from caffe import layers as L
from caffe import params as P

def cnn(hdf5, batch_size):
    n = caffe.NetSpec()
    n.data_img, n.data_speed, n.label_forward, n.label_backward, n.label_left, label_right = (
        L.HDF5Data(batch_size=batch_size, source=hdf5, ntop=6)
    )

    n.conv1 = L.Convolution(n.data, kernel_size=7, num_output=32, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop1 = L.Dropout(n.pool1, in_place=True)
    n.relu1 = L.ReLU(n.drop1, in_place=True)

    n.conv2 = L.Convolution(n.relu1, kernel_size=5, num_output=42, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop2 = L.Dropout(n.pool2, in_place=True)
    n.relu2 = L.ReLU(n.drop2, in_place=True)

    n.conv3 = L.Convolution(n.relu2, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool3 = L.Pooling(n.conv3, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop3 = L.Dropout(n.pool3, in_place=True)
    n.relu3 = L.ReLU(n.drop3, in_place=True)

    n.conv4 = L.Convolution(n.relu3, kernel_size=3, num_output=64, weight_filler=dict(type='xavier'))
    n.pool4 = L.Pooling(n.conv4, kernel_size=3, stride=2, pool=P.Pooling.AVE)
    # Data of shape `batch_size*64*3*3` out of this layer (if dropout ignored), 
    # for a total of `batch_size*576` neurons.
    # Would you recommend to downsize this `3*3` feature map to `2*2`
    # or even `1*1` and to remove dropout at this level?
    n.drop4 = L.Dropout(n.pool4, in_place=True)
    n.relu4 = L.ReLU(n.drop4, in_place=True)

    n.join_speed = L.Concat(n.relu4, n.data_speed, in_place=True)
    # Note that I might be wrong on how the parameters are passed to the concat layer 
    n.ip1 = L.InnerProduct(n.join_speed, num_output=512, weight_filler=dict(type='xavier'))
    n.sig1 = L.Sigmoid(n.ip1, in_place=True)

    n.ip_f = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_f = L.Accuracy(n.ip_f, n.label_forward)
    n.loss_f = L.SoftmaxWithLoss(n.ip_f, n.label_forward)

    n.ip_b = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_b = L.Accuracy(n.ip_b, n.label_backward)
    n.loss_b = L.SoftmaxWithLoss(n.ip_b, n.label_backward)

    n.ip_l = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_l = L.Accuracy(n.ip_l, n.label_left)
    n.loss_l = L.SoftmaxWithLoss(n.ip_l, n.label_left)

    n.ip_r = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_r = L.Accuracy(n.ip_r, n.label_right)
    n.loss_r = L.SoftmaxWithLoss(n.ip_r, n.label_right)

    return n.to_proto()

with open('cnn_train.prototxt', 'w') as f:
    f.write(str(
        cnn(train_filename_list_txt, 100)
    ))

此外，我想一次只按下左箭头键或右箭头键之一。考虑到我将使用一些 SoftmaxWithLossLayer:

，而不是之后以编程方式进行，像这样融合 label_right 和 label_left 是个好主意吗

label_right:
-> number N  x  1 integer number (0 for left or 1 for right)

Answer 1

最后，我所做的是正确的任务，除了连接层可能无法工作，因为连接层的形状不同。我使用 cifar-100 数据集对此进行了测试，其中既有粗标签也有细标签，效果很好。

Caffe：一次预测多个标签（Python）

Caffe: predict multiple labels at once (in Python)

python

classification

hdf5

neural-network

caffe

HDF5python代码

关于 HDF5 数据形状的信息

卷积神经网络架构