如何有效地(不循环)从 C++ 中的 torchscript 预测的张量中获取数据?

How to efficiently (without looping) get data from tensor predicted by a torchscript in C++?

我正在从 C++ 程序调用 torchscript(从 Python 序列化的神经网络):

  // define inputs
  int batch = 3; // batch size
  int n_inp = 2; // number of inputs
  double I[batch][n_inp] = {{1.0, 1.0}, {2.0, 3.0}, {4.0, 5.0}}; // some random input
  std::cout << "inputs" "\n";  // print inputs
  for (int i = 0; i < batch; ++i)
  {    
    std::cout << "\n";
    for (int j = 0; j < n_inp; ++j)
    {
      std::cout << I[i][j] << "\n";
    }
  }
  
  // prepare inputs for feeding to neural network
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(torch::from_blob(I, {batch, n_inp}, at::kDouble));

  // deserialize and load scriptmodule
  torch::jit::script::Module module;
  module = torch::jit::load("Net-0.pt");

  // do forward pass
  auto outputs = module.forward(inputs).toTensor();

通常,要从输出中获取数据,会执行以下(逐元素)操作:

  // get data from outputs
  std::cout << "outputs" << "\n";
  int n_out = 1;
  double outputs_data[batch][n_out];
  for (int i = 0; i < batch; i++) 
  {
    for (int j = 0; j < n_out; j++)
    {
      outputs_data[i][j] = outputs[i][j].item<double>();
      std::cout << outputs_data[i][j] << "\n";
    }
  }

但是,使用 .item 的这种循环效率非常低(在实际代码中,我将在每个时间步预测数百万个点)。我想直接从 outputs 获取数据(不遍历元素)。我试过了:

  int n_out = 1;
  double outputs_data[batch][n_out];
  outputs_data = outputs.data_ptr<double>();

但是,它给出了错误:

error: incompatible types in assignment of ‘double*’ to ‘double [batch][n_out]’
   outputs_data = outputs.data_ptr<double>();
                                           ^

请注意,outputs_data的类型固定为double,无法更改。

需要深拷贝如下:

double outputs_data[batch];
std::memcpy(outputs_data, outputs.data_ptr<dfloat>(), sizeof(double)*batch);