如何正确地将 cv::Mat 转换为具有完美匹配值的 torch::Tensor?

How to properly convert a cv::Mat into a torch::Tensor with perfect match of values?

我正在尝试 运行 推断 C++ 中的 jit 跟踪模型,目前我在 Python 中获得的输出与我在 C++ 中获得的输出不同。

最初我认为这是由 jit 模型本身引起的,但现在我不这么认为,因为我发现 C++ 代码中的输入张量存在一些小偏差。我相信我按照文档的指示做了所有事情,所以这也可能会在 torch::from_blob 中显示一个问题。我不确定!

因此,为了确定是哪种情况,这里是 Python 和 C++ 中的代码片段以及用于测试它的示例输入。

这是示例图片:

对于 Pytorch 运行 以下代码片段:

import cv2
import torch
from PIL import Image 
import math
import numpy as np

img = Image.open('D:/Codes/imgs/profile6.jpg')
width, height = img.size
scale = 0.6
sw, sh = math.ceil(width * scale), math.ceil(height * scale)
img = img.resize((sw, sh), Image.BILINEAR)
img = np.asarray(img, 'float32')

# preprocess it 
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, 0)
img = (img - 127.5) * 0.0078125
img = torch.from_numpy(img)

对于 C++:

#include <iostream>
#include <torch/torch.h>
#include <torch/script.h>
using namespace torch::indexing;

#include <opencv2/core.hpp>
#include<opencv2/imgproc/imgproc.hpp>
#include<opencv2/highgui/highgui.hpp>

void test15()
{
    std::string pnet_path = "D:/Codes//MTCNN/pnet.jit"; 
    cv::Mat img = cv::imread("D:/Codes/imgs/profile6.jpg");
    int width = img.cols;
    int height = img.rows;
    float scale = 0.6f;
    int sw = int(std::ceil(width * scale));
    int sh = int(std::ceil(height * scale));

    //cv::Mat img;
    cv::resize(img, img, cv::Size(sw, sh), 0, 0, 1);

    auto tensor_image = torch::from_blob(img.data, { img.rows, img.cols, img.channels() }, at::kByte);
    tensor_image = tensor_image.permute({ 2,0,1 });
    tensor_image.unsqueeze_(0);
    tensor_image = tensor_image.toType(c10::kFloat).sub(127.5).mul(0.0078125);
    tensor_image.to(c10::DeviceType::CPU);
}

### Input comparison : 
and here are the tensor values both in Python and C++ 
Pytorch input (`img[:, :, :10, :10]`):

```python
img: tensor([[
    [[0.3555,  0.3555,  0.3477,  0.3555,  0.3711,  0.3945,  0.3945,  0.3867,  0.3789,  0.3789],
    [ 0.3477,  0.3555,  0.3555,  0.3555,  0.3555,  0.3555,  0.3555,  0.3477,  0.3398,  0.3398],
    [ 0.3320,  0.3242,  0.3320,  0.3242,  0.3320,  0.3398,  0.3398,  0.3242,  0.3164,  0.3242],
    [ 0.2852,  0.2930,  0.2852,  0.2852,  0.2930,  0.2930,  0.2930,  0.2852,  0.2773,  0.2773],
    [ 0.2539,  0.2617,  0.2539,  0.2617,  0.2539,  0.2148,  0.2148,  0.2148,  0.2070,  0.2070],
    [ 0.1914,  0.1914,  0.1836,  0.1836,  0.1758,  0.1523,  0.1367,  0.1211,  0.0977,  0.0898],
    [ 0.1367,  0.1211,  0.0977,  0.0820,  0.0742,  0.0586,  0.0273,  -0.0195, -0.0742, -0.0820],
    [-0.0039, -0.0273, -0.0508, -0.0664, -0.0898, -0.1211, -0.1367, -0.1523, -0.1758, -0.1758],
    [-0.2070, -0.2070, -0.2148, -0.2227, -0.2148, -0.1992, -0.1992, -0.1836, -0.1680, -0.1680],
    [-0.2539, -0.2461, -0.2383, -0.2305, -0.2227, -0.1914, -0.1836, -0.1758, -0.1680, -0.1602]],

    [[0.8398,  0.8398,  0.8320,  0.8242,  0.8320,  0.8477,  0.8398, 0.8320,  0.8164,  0.8164],
    [ 0.8320,  0.8242,  0.8164,  0.8164,  0.8086,  0.8008,  0.7930, 0.7852,  0.7695,  0.7695],
    [ 0.7852,  0.7852,  0.7773,  0.7695,  0.7695,  0.7617,  0.7539, 0.7383,  0.7305,  0.7148],
    [ 0.7227,  0.7070,  0.7070,  0.6992,  0.6914,  0.6836,  0.6836, 0.6680,  0.6523,  0.6367],
    [ 0.6289,  0.6211,  0.6211,  0.6211,  0.6055,  0.5586,  0.5508, 0.5352,  0.5273,  0.5039],
    [ 0.4805,  0.4727,  0.4648,  0.4648,  0.4570,  0.4180,  0.3945, 0.3633,  0.3477,  0.3164],
    [ 0.3555,  0.3398,  0.3086,  0.2930,  0.2695,  0.2461,  0.2070, 0.1523,  0.1055,  0.0820],
    [ 0.1367,  0.1133,  0.0820,  0.0508,  0.0273, -0.0117, -0.0352, -0.0508, -0.0820, -0.0898],
    [-0.1211, -0.1289, -0.1445, -0.1602, -0.1602, -0.1523, -0.1523, -0.1367, -0.1367, -0.1289],
    [-0.2070, -0.1992, -0.1992, -0.1992, -0.1992, -0.1680, -0.1680, -0.1602, -0.1523, -0.1445]],

    [[0.9492,  0.9414,  0.9336,  0.9180,  0.9180,  0.9336,  0.9258, 0.9023,  0.8867,  0.9023],
    [ 0.9258,  0.9258,  0.9102,  0.9023,  0.8945,  0.8789,  0.8633, 0.8477,  0.8320,  0.8398],
    [ 0.8711,  0.8633,  0.8555,  0.8477,  0.8320,  0.8242,  0.8086, 0.7930,  0.7852,  0.7773],
    [ 0.7852,  0.7773,  0.7617,  0.7539,  0.7461,  0.7305,  0.7148, 0.6992,  0.6914,  0.6836],
    [ 0.6758,  0.6680,  0.6602,  0.6602,  0.6367,  0.5820,  0.5742, 0.5508,  0.5430,  0.5273],
    [ 0.5117,  0.5117,  0.4961,  0.4883,  0.4727,  0.4336,  0.4102, 0.3711,  0.3477,  0.3242],
    [ 0.3867,  0.3711,  0.3398,  0.3164,  0.2930,  0.2539,  0.2148, 0.1523,  0.1055,  0.0820],
    [ 0.1680,  0.1445,  0.1055,  0.0742,  0.0352, -0.0039, -0.0273, -0.0586, -0.0820, -0.0898],
    [-0.0898, -0.0977, -0.1211, -0.1367, -0.1445, -0.1445, -0.1445, -0.1445, -0.1445, -0.1445],
    [-0.1758, -0.1680, -0.1680, -0.1680, -0.1680, -0.1523, -0.1523, -0.1602, -0.1602, -0.1523]]]])

C++/Libtorch 张量值 (img.index({Slice(), Slice(), Slice(None, 10), Slice(None, 10)});):

img: (1,1,.,.) =
  0.3555  0.3555  0.3555  0.3555  0.3555  0.4023  0.3945  0.3867  0.3789  0.3789
  0.3633  0.3633  0.3555  0.3555  0.3555  0.3555  0.3477  0.3555  0.3398  0.3398
  0.3398  0.3320  0.3320  0.3242  0.3398  0.3320  0.3398  0.3242  0.3242  0.3242
  0.2930  0.2930  0.2852  0.2773  0.2852  0.2930  0.2852  0.2852  0.2773  0.2852
  0.2695  0.2695  0.2617  0.2773  0.2695  0.2227  0.2227  0.2227  0.2148  0.2148
  0.1914  0.1914  0.1914  0.1914  0.1914  0.1602  0.1445  0.1289  0.1055  0.0977
  0.1289  0.1133  0.0820  0.0742  0.0586  0.0586  0.0195 -0.0273 -0.0820 -0.0898
  0.0039 -0.0195 -0.0508 -0.0664 -0.0820 -0.1289 -0.1445 -0.1602 -0.1836 -0.1836
 -0.2070 -0.2148 -0.2227 -0.2383 -0.2305 -0.2070 -0.2070 -0.1914 -0.1836 -0.1758
 -0.2539 -0.2461 -0.2461 -0.2383 -0.2305 -0.1914 -0.1914 -0.1758 -0.1680 -0.1602

(1,2,.,.) =
  0.8398  0.8398  0.8242  0.8164  0.8242  0.8555  0.8398  0.8320  0.8242  0.8242
  0.8320  0.8320  0.8242  0.8242  0.8086  0.8008  0.7930  0.7773  0.7695  0.7617
  0.7930  0.7852  0.7773  0.7695  0.7695  0.7695  0.7539  0.7461  0.7305  0.7227
  0.7070  0.7070  0.6992  0.6992  0.6914  0.6836  0.6758  0.6602  0.6523  0.6367
  0.6367  0.6367  0.6289  0.6289  0.6211  0.5664  0.5586  0.5430  0.5352  0.5117
  0.4805  0.4805  0.4805  0.4648  0.4727  0.4258  0.4023  0.3711  0.3555  0.3320
  0.3398  0.3320  0.3008  0.2773  0.2617  0.2461  0.1992  0.1445  0.0898  0.0586
  0.1367  0.1211  0.0898  0.0508  0.0273 -0.0195 -0.0352 -0.0664 -0.0898 -0.1055
 -0.1211 -0.1289 -0.1367 -0.1602 -0.1602 -0.1523 -0.1523 -0.1445 -0.1445 -0.1367
 -0.2148 -0.2070 -0.2070 -0.2070 -0.1992 -0.1680 -0.1680 -0.1602 -0.1523 -0.1445

(1,3,.,.) =
  0.9414  0.9414  0.9336  0.9180  0.9102  0.9336  0.9258  0.9023  0.8945  0.9023
  0.9180  0.9180  0.9102  0.9102  0.8945  0.8711  0.8633  0.8555  0.8242  0.8477
  0.8711  0.8711  0.8633  0.8477  0.8320  0.8164  0.8164  0.7930  0.7852  0.7852
  0.7773  0.7773  0.7539  0.7461  0.7305  0.7148  0.7070  0.6992  0.6836  0.6758
  0.6836  0.6836  0.6758  0.6680  0.6445  0.5898  0.5820  0.5586  0.5508  0.5352
  0.5273  0.5195  0.5117  0.4883  0.4883  0.4414  0.4102  0.3789  0.3633  0.3398
  0.3867  0.3633  0.3320  0.3008  0.2695  0.2539  0.2070  0.1445  0.0898  0.0664
  0.1836  0.1523  0.1133  0.0742  0.0352 -0.0117 -0.0352 -0.0664 -0.0898 -0.1055
 -0.0820 -0.0977 -0.1211 -0.1367 -0.1445 -0.1445 -0.1445 -0.1367 -0.1445 -0.1445
 -0.1758 -0.1758 -0.1758 -0.1758 -0.1758 -0.1602 -0.1523 -0.1680 -0.1602 -0.1602

[ CPUFloatType{1,3,10,10} ]

顺便说一下,这些是 normalized/preprocessed:

之前的张量值

Python:

img.shape: (3, 101, 180)
img: [
 [[173. 173. 172. 173. 175.]
  [172. 173. 173. 173. 173.]
  [170. 169. 170. 169. 170.]
  [164. 165. 164. 164. 165.]
  [160. 161. 160. 161. 160.]]

 [[235. 235. 234. 233. 234.]
  [234. 233. 232. 232. 231.]
  [228. 228. 227. 226. 226.]
  [220. 218. 218. 217. 216.]
  [208. 207. 207. 207. 205.]]

 [[249. 248. 247. 245. 245.]
  [246. 246. 244. 243. 242.]
  [239. 238. 237. 236. 234.]
  [228. 227. 225. 224. 223.]
  [214. 213. 212. 212. 209.]]]

CPP:

img.shape: [1, 3, 101, 180]
img: (1,1,.,.) =
  173  173  173  173  173
  174  174  173  173  173
  171  170  170  169  171
  165  165  164  163  164
  162  162  161  163  162

(1,2,.,.) =
  235  235  233  232  233
  234  234  233  233  231
  229  228  227  226  226
  218  218  217  217  216
  209  209  208  208  207

(1,3,.,.) =
  248  248  247  245  244
  245  245  244  244  242
  239  239  238  236  234
  227  227  224  223  221
  215  215  214  213  210
[ CPUByteType{1,3,5,5} ]

如您所见,乍一看,它们可能看起来相同,但仔细观察,您会发现输入中有许多小偏差!我怎样才能避免这些更改,并获得 C++ 中的准确值?

我想知道是什么导致了这种奇怪的现象发生!

明确表示这确实是一个输入问题,更具体地说,这是因为图像首先由 Python 中的 PIL.Image.open 读取,然后更改为 numpy 数组.如果使用 OpenCV 读取图像,那么 input-wise 的所有内容在 Python 和 C++ 中都是相同的。

更多解释

但是,在我的特定情况下,使用 OpenCV 图像会导致最终结果发生微小变化。这个 change/difference 最小化的唯一方法是当我制作 Opencv 图像灰度并将其提供给网络时,在这种情况下,PIL 输入和 opencv 输入都具有几乎相同的输出。

下面是两个例子,pil图像是bgr,opencv是灰度模式:你需要将它们保存到磁盘上,看看它们几乎完全相同(左边是cv_image,右边是pil_image):

但是,如果我只是不将 opencv 图像转换为灰度模式(并返回 bgr 以获得 3 个通道),这就是它的样子(左边是 cv_image,右边是 pil_image):

更新

结果又是输入相关。我们存在细微差异的原因是模型是在 RGB 图像上训练的,因此通道顺序很重要。使用 PIL 图像时,对于不同的方法会来回发生一些转换,因此导致整个事情变得一团糟,您之前在上面读过。

长话短说,从 cv::Mattorch::Tensor 的转换没有任何问题,反之亦然,问题在于图像的创建和馈送方式Python 和 C++ 中的网络不同。 Python 和 C++ 后端都使用 OpenCV 处理图像时,它们的输出和结果匹配 100%。