修改多个输入的 Caffe C++ 预测代码

Modifying the Caffe C++ prediction code for multiple inputs

我实施了 Caffe C++ example and while it works really well, it's incredibly slow because it only accepts images one by one. Ideally I'd like to pass Caffe a vector of 200 images and return the best prediction for each one. I received some great help from Fanglin Wang 的修改版本并实施了他的一些建议,但在弄清楚如何从每张图像中检索最佳结果时仍然遇到一些问题。

Classify 方法现在传递一个 cv::Mat 对象的向量(变量 input_channels),它是灰度浮点图像的向量。我已经删除了代码中的预处理方法,因为我不需要将这些图像转换为浮点数或减去均值图像。我也一直在尝试摆脱 N 变量,因为我只想 return 每个图像的最高预测和概率。

#include "Classifier.h"
using namespace caffe;
using std::string;

Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

/* Return the top N predictions. */
std::vector<Prediction> Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
  std::vector<float> output = Predict(input_channels);

    std::vector<int> maxN = Argmax(output, 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
    return predictions;
}

std::vector<float> Classifier::Predict(const std::vector<cv::Mat> &input_channels, int num_images) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(num_images, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  WrapInputLayer(&input_channels);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  const float* begin = output_layer->cpu_data();
  const float* end = begin + num_images * output_layer->channels();
  return std::vector<float>(begin, end);
}

/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels() * num_images; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

更新

非常感谢你的帮助 Shai,我做了你推荐的更改,但似乎遇到了一些我无法解决的奇怪的编译问题(我设法解决了一些问题)。

这些是我所做的更改:

头文件:

#ifndef __CLASSIFIER_H__
#define __CLASSIFIER_H__

#include <caffe/caffe.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <algorithm>
#include <iosfwd>
#include <memory>
#include <string>
#include <utility>
#include <vector>


using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

/* Pair (label, confidence) representing a prediction. */
typedef std::pair<string, float> Prediction;

class Classifier {
 public:
  Classifier(const string& model_file,
             const string& trained_file,
             const string& label_file);

  std::vector< std::pair<int,float> > Classify(const std::vector<cv::Mat>& img);

 private:

  std::vector< std::vector<float> > Predict(const std::vector<cv::Mat>& img, int nImages);

  void WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages);

  void Preprocess(const std::vector<cv::Mat>& img,
                  std::vector<cv::Mat>* input_channels, int nImages);

 private:
  shared_ptr<Net<float> > net_;
  cv::Size input_geometry_;
  int num_channels_;
  std::vector<string> labels_;
};

#endif /* __CLASSIFIER_H__ */

Class 文件:

#define CPU_ONLY
#include "Classifier.h"

using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

Classifier::Classifier(const string& model_file,
                       const string& trained_file,
                       const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
  CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs,
                        const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

std::vector< std::pair<int,float> > Classifier::Classify(const std::vector<cv::Mat>& img) {
  std::vector< std::vector<float> > output = Predict(img, img.size());

  std::vector< std::pair<int,float> > predictions;
  for ( int i = 0 ; i < output.size(); i++ ) {
    std::vector<int> maxN = Argmax(output[i], 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
  }
  return predictions;
}

std::vector< std::vector<float> > Classifier::Predict(const std::vector<cv::Mat>& img, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(nImages, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  std::vector<cv::Mat> input_channels;
  WrapInputLayer(&input_channels, nImages);

  Preprocess(img, &input_channels, nImages);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */

  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector <std::vector<float> > ret;
  for (int i = 0; i < nImages; i++) {
    const float* begin = output_layer->cpu_data() + i*output_layer->channels();
    const float* end = begin + output_layer->channels();
    ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

/* Wrap the input layer of the network in separate cv::Mat objects
 * (one per channel). This way we save one memcpy operation and we
 * don't need to rely on cudaMemcpy2D. The last preprocessing
 * operation will write the separate channels directly to the input
 * layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels()* nImages; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

void Classifier::Preprocess(const std::vector<cv::Mat>& img,
                            std::vector<cv::Mat>* input_channels, int nImages) {
  for (int i = 0; i < nImages; i++) {
      vector<cv::Mat> channels;
      cv::split(img[i], channels);
      for (int j = 0; j < channels.size(); j++){
           channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
      }
  }
}

不幸的是,我认为网络前向传递的并行化尚未实现。但是,如果您希望可以简单地实现自己的包装器以通过网络的副本并行地重复 运行 数据?

看看

在链接的 prototxt 中,您只需定义

input_shape {
  dim: 64 // num of images
  dim: 1
  dim: 28 // height
  dim: 28 // width
}

现有实施评估一批 64 张图像,但不一定并行。但是,如果在 GPU 上 运行ning,处理一批 64 个图像将比处理 64 个单图像批次更快。

如果我对你的问题的理解正确,你输入了 n 张图像,期望 n(label, prob),但只得到一对这样的图像。

我相信这些修改应该对您有用:

  1. Classifier::Predict 应该 return 一个 vector< vector<float> >,即每个输入图像的 向量 概率。这是大小为 nvector 大小的向量 output_layer->channels():

    std::vector< std::vecot<float> > 
    Classifier::Predict(const std::vector<cv::Mat> &input_channels, 
                        int num_images) {
      // same code here...
    
      /* changes here: Copy the output layer to a std::vector */
      Blob<float>* output_layer = net_->output_blobs()[0];
      std::vector< std::vector<float> > ret;
      for ( int i = 0 ; i < num_images ; i++ ) {
          const float* begin = output_layer->cpu_data() + i*output_layer->channels();
          const float* end = begin + output_layer->channels();
          ret.push_back( std::vector<float>(begin, end) );
      }
      return ret;
    }
    
  2. Classifier::Classify 中,您需要独立处理每个 vector<float>Argmax

     std::vector< std::pair<int,float> > 
     Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
    
       std::vector< std::vector<float> > output = Predict(input_channels);
    
       std::vector< std::pair<int,float> > predictions;
       for ( int i = 0 ; i < output.size(); i++ ) {
           std::vector<int> maxN = Argmax(output[i], 1);
           int idx = maxN[0];
           predictions.push_back(std::make_pair(labels_[idx], output[idx]));
       }
       return predictions;
     }