在 cpu 上使用带有 nvidia tensorrt 的 .h5 模型而不是 gpu

Question

我有一个 .h5 模型（用于 GPU？），我想运行在我的 CPU 上。我使用 python 转换了模型，看起来它确实被转换了，但是当运行在 docker tensorrt 中使用它时，我得到错误：

     [[TRTEngineOp_8]]
E0106 21:02:54.141211 1 model_repository_manager.cc:810] failed to load 'retinanet_TRT' version 1: Internal: No OpKernel was registered to support Op 'TRTEngineOp' used by {{node TRTEngineOp_16}}with these attrs: [use_calibration=false, fixed_input_size=true, input_shapes=[[?,?,?,3]], OutT=[DT_FLOAT], precision_mode="FP16", static_engine=false, serialized_segment="\ne\n1T...2[=13=]5VALID", cached_engine_batches=[], InT=[DT_FLOAT], calibration_data="", output_shapes=[[?,?,?,64]], workspace_size_bytes=2127659, max_cached_engines_count=1, segment_funcdef_name="TRTEngineOp_16_native_segment"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  device='GPU'

我该怎么做才能转换模型，以便我只能将其与 CPU 一起使用？

是这样转换的：

with tf.Graph().as_default():
    with tf.Session() as sess:
        graph = sess.graph
        K.set_session(sess)
        K.set_learning_phase(0)
        inference_model = create_model(num_classes=num_classes)
        load_model()

        # Find output nodes
        outputs, output_node_list = get_nodes_from_model(inference_model.outputs)
        # find input nodes
        inputs, input_node_list = get_nodes_from_model(inference_model.inputs)

        generate_config()

        with sess.as_default():
            freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(None or []))
            output_names = output_node_list or []
            output_names += [v.op.name for v in tf.global_variables()]
            input_graph_def = graph.as_graph_def()
            for node in input_graph_def.node:
                # print(node.name)
                node.device = ""
            frozen_graph = tf.compat.v1.graph_util.convert_variables_to_constants(
                sess, input_graph_def, output_names, freeze_var_names)
            trt_graph = trt.create_inference_graph(
                # frozen model
                input_graph_def=frozen_graph,
                outputs=output_node_list,
                # specify the max workspace
                max_workspace_size_bytes=500000000,
                # precision, can be "FP32" (32 floating point precision) or "FP16"
                precision_mode=precision,
                is_dynamic_op=True)
            # Finally we serialize and dump the output graph to the filesystem
            with tf.gfile.GFile(model_save_path, 'wb') as f:
                f.write(trt_graph.SerializeToString())

            print("TensorRT model is successfully stored! \n")

is_dynamic_op=True 已经帮助转换模型（现在说它已成功存储），但我仍然无法在 docker TensorRT 服务器中加载它。

我正在使用 nvcr.io/nvidia/tensorflow:19.10-py3 容器转换模型和 nvcr.io/nvidia/tensorrtserver:19.10-py3 容器用于 TensorRT 服务器。

Answer 1

只是不要将您的模型转换为 TensorRT。

with tf.Graph().as_default():
    with tf.Session() as sess:
        graph = sess.graph
        K.set_session(sess)
        K.set_learning_phase(0)
        inference_model = create_model(num_classes=num_classes)
        load_model()

        # Find output nodes
        outputs, output_node_list = get_nodes_from_model(inference_model.outputs)
        # find input nodes
        inputs, input_node_list = get_nodes_from_model(inference_model.inputs)

        generate_config()

        with sess.as_default():
            freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(None or []))
            output_names = output_node_list or []
            output_names += [v.op.name for v in tf.global_variables()]
            input_graph_def = graph.as_graph_def()
            for node in input_graph_def.node:
                # print(node.name)
                node.device = ""
            frozen_graph = tf.compat.v1.graph_util.convert_variables_to_constants(
                sess, input_graph_def, output_names, freeze_var_names)

            # Finally we serialize and dump the output graph to the filesystem
            with tf.gfile.GFile(model_save_path, 'wb') as f:
                f.write(frozen_graph.SerializeToString())

在 cpu 上使用带有 nvidia tensorrt 的 .h5 模型而不是 gpu

Use .h5 model with nvidia tensorrt on cpu instead of gpu

cpu

tensorflow

tensorrt