当“tf.function”用于用循环装饰 memeber 函数时，ONNX 模型检查器失败，而 ONNX 运行时运行正常

Question

当张量流模型包含 tf.function 带有 for 循环的修饰函数时，tf->onnx 转换会产生警告：

WARNING:tensorflow:From /Users/amit/Programs/lammps/kim/kliff/venv/lib/python3.7/site-packages/tf2onnx/tf_loader.py:706: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
Cannot infer shape for model/ex_layer/PartitionedCall/while: model/ex_layer/PartitionedCall/while:3
Cannot infer shape for model/ex_layer/PartitionedCall/Identity: model/ex_layer/PartitionedCall/Identity:0
Cannot infer shape for Func/model/ex_layer/PartitionedCall/output/_3: Func/model/ex_layer/PartitionedCall/output/_3:0
Cannot infer shape for Identity: Identity:0
missing output shape for while/Identity_3:0
missing output shape for while/Identity_3:0
missing output shape for while/Identity_3:0
missing output shape for while/Identity_3:0
...

并且由于获得的模型是运行通过onnx运行时间它运行没问题，但是模型检查器给出了以下错误

Traceback (most recent call last):
  File "failed_example.py", line 85, in <module>
    onnx.checker.check_model(onnx.load("tmp.onnx"))
  File "venv/lib/python3.7/site-packages/onnx/checker.py", line 106, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Field 'shape' of type is required but missing.

Netron 在有装饰功能和没有装饰功能的模型之间没有显示出任何明显的差异。我猜错误来自于这样一个事实，即 for 循环被转换为单独的 while 循环图，其输入形状未定义。但它在没有 tf.function 装饰器的情况下也能完美工作。我在下面放了一个最小的复制代码。

我认为与以下问题有关：

要复制的代码：

import tensorflow as tf
import numpy as np
import sys
import onnx
import onnxruntime
import tf2onnx

# =============================================================================
# Layer and its herler functions
# COMMENT IT OUT TO PASS ONNX CHECK
@tf.function(
    input_signature=[
    tf.TensorSpec(shape=[None,None], dtype=tf.int32),
    tf.TensorSpec(shape=[None,None], dtype=tf.float32),
    tf.TensorSpec(shape=None, dtype=tf.float32),
    ])
def extra_function(
    list1,
    list2,
    accum_var
    ):
    some_num = 4
    num_iter = tf.size(list1)//some_num
    for i in range(num_iter):
        xyz_i = list2[0, i * 3 : (i + 1) * 3]
        accum_var += tf.reduce_sum(xyz_i)
    return accum_var

class ExLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()

    # Doesnt tf.function also create graphs out of called functions?
    # however it does not seem to do that if `call` function is decorated
    # @tf.function(
    #     input_signature=[
    #     tf.TensorSpec(shape=[None,None], dtype=tf.float32),
    #     tf.TensorSpec(shape=[None,None], dtype=tf.int32),
    #     ])
    def call(self, list2,list1):
        accum_var = tf.constant(0.0)
        accum_var = extra_function( list1, list2, accum_var)
        return accum_var
# =============================================================================


# =============================================================================
# Example implementation

layer1 = tf.keras.layers.Input(shape=(1,))
layer2 = tf.keras.layers.Input(shape=(1,), dtype=tf.int32)
EL = ExLayer()(layer1,layer2)
model = tf.keras.models.Model(inputs=[layer1, layer2], outputs=EL)

# Define input data
list2_tf = tf.constant([[0.,0.,0.,1.,1.,1.,2.,2.,2.,3.,3.,3.]],dtype=tf.float32)
list1_tf = tf.constant([[0,1,2,-1,1,0,2,-1,2,0,1,-1]],dtype=tf.int32)
list2_np = np.array([[0.,0.,0.,1.,1.,1.,2.,2.,2.,3.,3.,3.]],dtype=np.float32)
list1_np = np.array([[0,1,2,-1,1,0,2,-1,2,0,1,-1]],dtype=np.int32)

# Save to onnx
model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model,
            input_signature=[
                tf.TensorSpec(shape=[None,None], dtype=tf.float32, name="list2"),
                tf.TensorSpec(shape=[None,None], dtype=tf.int32, name="list1")
                ],
            opset=11,
            output_path="tmp.onnx")


# Load onnx runtime session
ort_session = onnxruntime.InferenceSession("tmp.onnx")
inputs = {"list2":list2_np, "list1":list1_np}

print("===================================================")
print("Original model evaluation:")
print(model([list2_tf,list1_tf]))
print("ORT session evaluation")
print(ort_session.run(None, inputs))
print("===================================================")

# Check with model checker
onnx.checker.check_model(onnx.load("tmp.onnx"))

ONNX 版本：1.10.2
Python版本：3.7.7
TF版本：2.7.0

我提交的相关 github 个问题：

Answer 1

问题在于您指定 accumm_var 形状的方式。

在输入签名中你有tf.TensorSpec(shape=None, dtype=tf.float32)。阅读代码，我看到您正在传递一个标量张量。标量张量是 0 维张量，因此您应该使用 shape=[] 而不是 shape=None。

我在这里运行用

注释 extra_function 后没有警告

tf.function(
    input_signature=[
    tf.TensorSpec(shape=[None,None], dtype=tf.int32),
    tf.TensorSpec(shape=[None,None], dtype=tf.float32),
    tf.TensorSpec(shape=[], dtype=tf.float32),
    ])

当“tf.function”用于用循环装饰 memeber 函数时，ONNX 模型检查器失败，而 ONNX 运行时运行正常

ONNX model checker fails while ONNX runtime works fine when `tf.function` is used to decorate memeber function with loop

python

keras

tensorflow

onnx