Pytorch 到 ONNX：找不到 RandomNormalLike 的实现

Question

我正在尝试将一个相当复杂的模型从 pytorch 转换为 ONNX。转换成功没有错误，但是我在加载模型时遇到这个错误：

Traceback (most recent call last):
  File "/home/***/***/***.py", line 50, in <module>
    main()
  File "/home/***/***/***.py", line 38, in main
    ort_session = ort.InferenceSession(onnx_path, providers=[
  File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 324, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 369, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for RandomNormalLike(1) node with name 'RandomNormalLike_598'

我认为错误所抱怨的 RandomNormalLike 节点可能对应于我拥有的这个模块：

class NoiseInjection(nn.Module):
    def __init__(self):
        super().__init__()

        self.weight = nn.Parameter(torch.zeros(1), requires_grad=True)

    def forward(
        self,
        feat: torch.Tensor,
        noise: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
        if noise is None:
            batch, _, height, width = feat.shape
            noise = torch.randn(
                batch, 1, height, width,
                dtype=feat.dtype,
                device=feat.device,
            )

        return feat + self.weight * noise

我也创建了一个不同的实现，但它导致了同样的错误：（编辑：这个版本确实有效。我在别处犯了一个不相关的错误，误导我认为它不起作用）

    def forward(
        self,
        feat: torch.Tensor,
        noise: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
        if noise is None:
            noise = torch.randn_like(feat[:, 0:1])

        return feat + self.weight * noise

我的pytorch和onnx版本如下：

$ conda list torch
# Name                    Version                   Build  Channel
torch                     1.10.0+cu113             pypi_0    pypi
torchaudio                0.10.0+cu113             pypi_0    pypi
torchvision               0.11.1+cu113             pypi_0    pypi

$ conda list onnx
# Name                    Version                   Build  Channel
onnx                      1.10.2                   pypi_0    pypi
onnxruntime-gpu           1.9.0                    pypi_0    pypi

如何才能将这样的模块导出到 onnx 并运行成功？

Answer 1

通过在线检查，我在 GitHub 上发现了一个关于 conv (https://github.com/microsoft/onnxruntime/issues/3130) 的类似问题，可能是 torch 中使用的参数类型与 RandomNormalLike 中可用的实现不兼容ONNX.

您能否在 netron 中检查 RandomNormalLike node/nodes 中的内容，看看它们是否符合规范：https://github.com/onnx/onnx/blob/main/docs/Operators.md#RandomNormal or https://github.com/onnx/onnx/blob/main/docs/Operators.md#RandomNormalLike

干杯

编辑：原来 RandomNormal 节点的类型为 10，对应于 fp16

虽然 onnx运行time 实现仅支持 float 和 double 请参阅此处的源代码：https://github.com/microsoft/onnxruntime/blob/24e35fba3217bf33b0e4064bc71d271a61938ba0/onnxruntime/core/providers/cpu/generator/random.cc#L354

这里的解决方案是运行 fp32 中的整个模型，或者明确要求 RandomNormalLike 使用浮点数或双精度值希望 torch 允许在 fp16 上进行混合计算，fp32/fp64 我猜

Answer 2

对于任何试图重现这个问题的人，我举了一个最小的例子。在下面的代码中，RandLike 有效，而 RandReferenced 无效：

import torch
from torch import nn
import onnxruntime as ort


class RandLike(nn.Module):
    def forward(self, x):
        return torch.randn_like(x[:, 0:1])


class RandReferenced(nn.Module):
    def forward(self, x):
        b, _ , w, h = x.shape
        return torch.randn(
            b, 1, w, h,
            device=x.device,
            dtype=x.dtype,
        )


module = RandLike().cuda().half()
dummy_input = torch.randn(2, 3, 4, 4, device='cuda').half()
torch.onnx.export(module, dummy_input, "randlike_2.onnx", input_names=["rand_input"], output_names=["rand_output"])
module = RandReferenced().cuda().half()
torch.onnx.export(module, dummy_input, "randReferenced_2.onnx", input_names=["rand_input"], output_names=["rand_output"])

ort_session = ort.InferenceSession("randlike_2.onnx", providers=[
    "CUDAExecutionProvider",
])
ort_session.run(["rand_output"], {"rand_input": dummy_input.cpu().numpy()})

ort_session = ort.InferenceSession("randReferenced_2.onnx", providers=[
    "CUDAExecutionProvider",
])
ort_session.run(["rand_output"], {"rand_input": dummy_input.cpu().numpy()})

运行以上代码导致以下错误：

$ CUDA_VISIBLE_DEVICES=0 python random_like_onnx.py
Traceback (most recent call last):
  File "/home/***/***/random_like_onnx.py", line 32, in <module>
    ort_session = ort.InferenceSession("randReferenced_2.onnx", providers=[
  File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 324, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/***/miniconda3/envs/***/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 369, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for RandomNormalLike(1) node with name 'RandomNormalLike_1'

Pytorch 到 ONNX：找不到 RandomNormalLike 的实现

Pytorch to ONNX: Could not find an implementation for RandomNormalLike

python

pytorch

onnx

onnxruntime