将 pytorch 转换为 torchscript 后的不同结果？将 NSnumber 转换为 Float 会造成任何损失吗？

Question

我将 pytorch 预训练模型（.pt）转换为 torchscript 模型（.pt），以便在 Swift 5（ios-iphone6s，xcode 11）中使用它。在 Swift 中，模型的“预测”函数给了我它的嵌入值（张量）。由于它作为预测结果返回了 NSNumber 数组，我使用类型转换 [NSNumber] 到 [Double] 或 [Float] 来计算两个嵌入值之间的距离。 L2归一化、点积等

然而，虽然pytorch版本得到了正确的答案，但torchscript模型却得到了很多错误的答案。不仅答案不同，两个嵌入对的距离计算也不同in torchscript 也和PC上pytorch模型的结果不同(CPU, Pycharm)。事实上，在使用类型转换进行距离计算之前，NSNumber(Swift) 中的嵌入值与 float32(pytorch) 中的值相差如此之大。我使用了相同的输入图像。

我试图找到原因..有一次，我从swift-torchscript中复制了嵌入值（[NSNumber]）并计算了pytorch中两个嵌入之间的距离，以检查是否有我在 Swift 中的距离计算实现有问题。我使用 torch.FloatTensor 来使用类型转换 [NSNumber] -> [Float]。我也试过[双]。结果，我发现了许多无穷大的数字。这个无穷大的数字和错误的答案有关吗？

这个“inf”是什么意思？是计算错误还是类型转换错误？从 NSNumber 转换为 Float 或 Double 时我是否丢失了信息？我怎样才能从 swift 中的 torchscript 模型中获得正确的值？ 我应该检查什么？

我使用了以下代码进行转换。 pytorch -> torchscript.

import torch

from models.inception_resnet_v1 import InceptionResnetV1

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

example = torch.rand(1, 3, 160, 160)
traced_script_module = torch.jit.trace(resnet, example)
traced_script_module.save("mobile_model.pt")

Answer 1

您使用的 InceptionResnetV1 来自： https://github.com/timesler/facenet-pytorch ？当您在输出比较中提到 pytorch 模型时，在 pytorch 中运行时，您指的是 torchscript 模型，还是 resnet 原样？

如果是后者，您是否已经检查过类似如下的内容？

当运行执行以下操作时，您会得到什么：

print('Original:')
orig_res = resnet(example)
print(orig_res.shape)
print(orig_res[0, 0:10])
print('min abs value:{}'.format(torch.min(torch.abs(orig_res))))
print('Torchscript:')
ts_res = traced_script_module(example)
print(ts_res.shape)
print(ts_res[0, 0:10])
print('min abs value:{}'.format(torch.min(torch.abs(ts_res))))
print('Dif sum:')
abs_diff = torch.abs(orig_res-ts_res)
print(torch.sum(abs_diff))
print('max dif:{}'.format(torch.max(abs_diff)))

定义后'traced_script_module'。我得到以下信息：

Original:
torch.Size([1, 512])
tensor([ 0.0347,  0.0145, -0.0124,  0.0723, -0.0102,  0.0653, -0.0574,  0.0004,
        -0.0686,  0.0695], device='cuda:0', grad_fn=<SliceBackward>)
min abs value:0.00034740756382234395
Torchscript:
torch.Size([1, 512])
tensor([ 0.0347,  0.0145, -0.0124,  0.0723, -0.0102,  0.0653, -0.0574,  0.0004,
        -0.0686,  0.0695], device='cuda:0', grad_fn=<SliceBackward>)
min abs value:0.0003474018594715744
Dif sum:
tensor(8.1539e-06, device='cuda:0', grad_fn=<SumBackward0>)
max dif:5.960464477539063e-08

这并不完美，但考虑到输出最小为10^-4，并且前一个数字是512个元素的绝对差之和，而不是平均值，似乎不太远为我而去。最大差异在 10^-8 左右。

顺便说一句，您可能想更改为：

example = torch.rand(1, 3, 160, 160).to(device)

如果您在上述测试中得到类似的结果，您从 swift-torchscript 中获得的前 10 个输出值的类型是什么，作为 NSNumber，然后，一旦转换为 float ，当与 pytorch 和 torchscript-pytorch 模型输出中的相同切片进行比较时？

将 pytorch 转换为 torchscript 后的不同结果？将 NSnumber 转换为 Float 会造成任何损失吗？

Different results after converting pytorch to torchscript? Converting NSnumber to Float cause any loss?

nsnumber

deep-learning

swift

pytorch

torchscript