在 Tensorflow 2.1 中转换后无法加载 Tensor RT SavedModel

Question

我一直在尝试转换 YOLOv3 model implemented in Tensorflow 2 to Tensor RT by following the tutorial on the NVIDIA website (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel)。

我使用 SavedModel 方法进行转换，并成功地将原始模型转换为 FP16 并将结果另存为新的 SavedModel。当在进行转换的同一进程中加载新的 SavedModel 时，它会正确加载，并且我能够运行对图像进行推断，但是当我随后尝试加载 FP16 时会出现问题在新过程中保存模型。当我尝试这样做时，出现以下错误：

2020-04-01 10:39:42.428094: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-04-01 10:39:42.447415: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
Coco names not found, class labels will be empty
2020-04-01 10:39:53.892453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-01 10:39:53.920870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2020-04-01 10:39:53.920915: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-01 10:39:53.920950: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-01 10:39:53.937043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-01 10:39:53.941012: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-01 10:39:53.972250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-01 10:39:53.976883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-01 10:39:53.976919: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-01 10:39:53.978525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-01 10:39:53.978833: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-01 10:39:54.112532: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999115000 Hz
2020-04-01 10:39:54.114178: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f3a70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-01 10:39:54.114208: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-01 10:39:54.219842: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555e230 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-01 10:39:54.219872: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN Xp, Compute Capability 6.1
2020-04-01 10:39:54.220896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN Xp computeCapability: 6.1
coreClock: 1.582GHz coreCount: 30 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 510.07GiB/s
2020-04-01 10:39:54.220936: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-01 10:39:54.220948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-01 10:39:54.220981: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-01 10:39:54.220998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-01 10:39:54.221013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-01 10:39:54.221029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-01 10:39:54.221039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-01 10:39:54.222281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-01 10:39:54.232890: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-01 10:39:54.636732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-01 10:39:54.636779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-04-01 10:39:54.636786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-04-01 10:39:54.638840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11240 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-04-01 10:40:26.366595: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-04-01 10:40:31.509694: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMS_TRT version 1
2020-04-01 10:40:31.509767: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
2020-04-01 10:40:31.513205: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_STATE: std::exception
2020-04-01 10:40:31.513262: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_CONFIG: Deserialize the cuda engine failed.
Segmentation fault (core dumped)

我不确定是什么导致了这个问题，我能找到的唯一提出这个问题的帖子是在 nvidia 开发论坛上，但没有提供答案。 (https://forums.developer.nvidia.com/t/getplugincreator-could-not-find-plugin-batchednms-trt-version-1/84205/3)

因此，我的问题是；当加载代码在与转换代码不同的进程中执行时，为什么 SavedModel 不加载？而且，我怎样才能加载我的 Tensor RT 模型而不必每次都从非 TensorRT 模型转换它？

这是用于转换模型的代码和在同一进程中加载转换后的模型时的推理输出。

代码

import os
from os.path import join as pjoin

import tensorflow as tf
import numpy as np
from tensorflow.python.framework import graph_io
from tensorflow.keras.models import load_model
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.framework import convert_to_constants

from caipy_services_backend.models import Yolov3
from caipy_services_backend.models.yolov3.utils import freeze_all

# Clear any previous session.
tf.keras.backend.clear_session()


def my_input_fn():
    for _ in range(1):
        inp1 = np.random.normal(size=(1, 416, 416, 3)).astype(np.float32)
        # inp2 = np.random.normal(size=(8, 16, 16, 3)).astype(np.float32)
        yield [inp1]


def convert_saved_model_and_reload(input_saved_model_dir, output_saved_model_dir):
    conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
    conversion_params = conversion_params._replace(
        max_workspace_size_bytes=(1 << 32))
    conversion_params = conversion_params._replace(precision_mode="FP16")
    conversion_params = conversion_params._replace(
        maximum_cached_engines=100)

    converter = tf.experimental.tensorrt.Converter(
        input_saved_model_dir=input_saved_model_dir, conversion_params=conversion_params)
    converter.convert()

    converter.build(input_fn=my_input_fn)
    converter.save(output_saved_model_dir)

    saved_model_loaded = tf.saved_model.load(
        output_saved_model_dir, tags=["serve"])
    graph_func = saved_model_loaded.signatures["serving_default"]
    frozen_func = convert_to_constants.convert_variables_to_constants_v2(
        graph_func)
    input_data = tf.convert_to_tensor(np.random.normal(size=(1, 416, 416, 3)).astype(np.float32))
    output = frozen_func(input_data)[0].numpy()
    print(output)

输出

[[[0. 0. 1. 1.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_3._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_4._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_5._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_0._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_7._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_1._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_2._serialized_trt_resource_filename
WARNING:tensorflow:Unresolved object in checkpoint: (root).trt_engine_resources.TRTEngineOp_6._serialized_trt_resource_filename
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

这是导致错误的代码

def load_tensor_rt_model(saved_model_dir):
    saved_model_loaded = tf.saved_model.load(
        saved_model_dir, tags=["serve"])
    graph_func = saved_model_loaded.signatures["serving_default"]
    frozen_func = convert_to_constants.convert_variables_to_constants_v2(
        graph_func)
    input_data = tf.convert_to_tensor(np.random.normal(size=(1, 416, 416, 3)).astype(np.float32))
    output = frozen_func(input_data)[0].numpy()
    print(output)

非常感谢对此问题的任何帮助。

更新：这个问题中描述的问题是由使用 converter.build() 引起的。当转换后的文件在没有构建的情况下被保存时，就可以毫无问题地加载它。但是我仍然不知道为什么构建会导致此问题。

计算机规格：

GPU：NVIDIA TitanXp
OS: Ubuntu 18.04

包版本：

NVIDIA 显卡驱动：440.59
CUDA: 10.1.243-1 amd64
CUDNN: 7.6.5.32-1+cuda10.1
libnvinfer-dev: 6.0.1-1+cuda10.1
libnvinfer-plugin-dev: 6.0.1-1+cuda10.1
python: 3.6.9
张量流：2.1.0

Answer 1

我发现发生这种情况是因为 libnvinfer_plugin.so.* 在使用保存的引擎进行推断时没有加载（我猜它在使用 convert.build() 时被加载和使用).

我在推断函数的开头强制使用 trt.init_libnvinfer_plugins(None,'')（将 tensorrt 导入为 trt）初始化插件，这恰好解决了这个特定错误。

在 Tensorflow 2.1 中转换后无法加载 Tensor RT SavedModel

Unable load Tensor RT SavedModel after conversion in Tensorflow 2.1

python

tensorflow

tensorrt