为什么我的特征提取脚本失败了？

Question

我使用了以下脚本，使用 AlexNet 预训练模型提取我的数据集的特征向量。

我从全连接的第 7 层提取特征向量并将其保存为 HICKLE 格式。

以下代码中使用的文件：

-train.txt包含5000张图片的路径

-val.txt包含1000张图片的路径

经过几次迭代，我得到了这个 error error == cudaSuccess (2 vs. 0) out of memory。 我知道我的 gpu 内存不足。你能帮我解决这个问题吗？

import numpy as np
import hickle as hkl
import caffe


caffe.set_mode_gpu()

def feature_extract(img):

    model_file='/home/jaba/caffe/data/diota_model/feature_extractor/bvlc_reference_caffenet.caffemodel'
    deploy_file='/home/jaba/caffe/data/diota_model/feature_extractor/alex.deployprototxt'

    net=caffe.Net(deploy_file,model_file,caffe.TEST)

    mean_values=np.array([103.939, 116.779, 123.68])

    #setting the transformer

    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
    transformer.set_mean('data', mean_values ) #subtract by mean
    transformer.set_transpose('data', (2,0,1)) #(H,W,C) => (C,H,W)
    transformer.set_raw_scale('data', 255.0)   #[0.0, 1.0] => [0.0, 255.0]
    transformer.set_channel_swap("data", (2,1,0)) # RGB => BGR

    #img = caffe.io.load_image(img)

    net.blobs['data'].data[...] = transformer.preprocess('data', img)

    output=net.forward()

    feat=net.blobs['fc7'].data.copy()

    return feat





def create_dataset(datalist,db_prefix):
    with open(datalist) as fr:
            lines = fr.readlines()
    lines = [line.rstrip() for line in lines]

    feats = []
    labels = []

    for line_i, line in enumerate(lines):

        a=len(line)
        label=line[a-1]
        img_path=line[0:a-2]
            img = caffe.io.load_image(img_path)
            feat = feature_extract(img)
            feats.append(feat)
            label = int(label)
            labels.append(label)
            if (line_i + 1) % 100 == 0:
                    print "processed", line_i + 1


    feats = np.asarray(feats)
    labels = np.asarray(labels)


    hkl.dump(feats, dbprefix + "_features.hkl", mode="w")
    hkl.dump(labels, dbprefix + "_labels.hkl", mode="w")




create_dataset('train.txt','vgg_fc7_train')
create_dataset('val.txt','vgg_fc7_test')

对于部署文件：

name: "AlexNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}

Answer 1

减少你的batch size。这是这种情况下最简单的修复方法。您可以将最小“批量大小”保持为 1。继续增加它以找到您的 GPU 的限制。你必须接受当前 GPU 的数字。

为什么我的特征提取脚本失败了？

Why my feature extraction script fails?

python

gpu

deep-learning

caffe

pycaffe