在 tf_trt create_inference_graph 中止(核心转储)

Aborted (core dumped) on tf_trt create_inference_graph

我正在尝试按照 [link][1] 中提到的说明,使用 tf_trtssdLite_mobilenet_V2 从 TensorFlow 转换为 tensorrt。我收到 Aborted (core dumped) 错误。真正奇怪的是,我在同一个图形架构上做了完全相同的事情(使用相同的程序),但在另一组上训练并且它运行没有错误。

OS : Ubuntu 18.04.2 GPU:特斯拉 M60 TensorFlow 1.13.1

我尝试修改 max_batch_size 和 max_workspace_size_bytes。但是这个问题好像不是GPU显存溢出的问题,它好像从来没有用过超过1.5G的显存。

import tensorflow.contrib.tensorrt as trt
import tensorflow as tf

frozen_graph, input_names, output_names = build_detection_graph(
    config="pipeline.config",
    checkpoint="model.ckpt-75000"
)
with tf.gfile.FastGFile('graph.pb', 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

with open("graph.uff","wb") as f:
    f.write(uff_model.SerializeToString())```

2019-04-18 12:45:50.313642: I tensorflow/contrib/tensorrt/segment/segment.cc:443] There are 169 ops of 35 different types in the graph that are not converted to TensorRT: Range, GreaterEqual, Greater, Split, TopKV2, Select, Less, Slice, Identity, BiasAdd, Reshape, Mul, Fill, Squeeze, Const, Unpack, ResizeBilinear, GatherV2, NonMaxSuppressionV3, Where, ExpandDims, Cast, Minimum, Sum, Sub, Pack, Transpose, Pad, ConcatV2, Exp, Placeholder, Add, Shape, NoOp, StridedSlice, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-04-18 12:45:51.094322: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 2
2019-04-18 12:45:51.146102: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-04-18 12:46:15.758417: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 275 nodes succeeded.
2019-04-18 12:46:15.801363: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-04-18 12:47:02.994309: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 684 nodes succeeded.
2019-04-18 12:47:03.494635: F tensorflow/core/graph/graph.cc:659] Check failed: inputs[edge->dst_input()] == nullptr Edge {name:'TRTEngineOp_1' id:1323 op device:{} def:{{{node TRTEngineOp_1}} = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,300,300,3]], max_cached_engines_count=10, output_shapes=[[1,576,19,19], [1,1280,10,10], [1,512,5,5], [1,256,3,3], [1,24,3,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_1_native_segment", serialized_segment="05...00[=12=]0[=12=]0", static_engine=true, use_calibration=false, workspace_size_bytes=11966231, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/stack, ^const6)}}:{name:'TRTEngineOp_0' id:1322 op device:{} def:{{{node TRTEngineOp_0}} = TRTEngineOp[InT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,256,3,3], [1,512,5,5], [1,1280,10,10], [1,576,19,19], [1,24,3,3]], max_cached_engines_count=10, output_shapes=[[1,1917,4], [1,1917,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_0_native_segment", serialized_segment="0o1\...00[=12=]0[=12=]0", static_engine=true, use_calibration=false, workspace_size_bytes=4810985, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/Relu6, FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/Relu6, FeatureExtractor/MobilenetV2/Conv_1/Relu6, FeatureExtractor/MobilenetV2/expanded_conv_13/expansion_output, BoxPredictor_3/BoxEncodingPredictor/BiasAdd, ^Postprocessor/scale_logits/y, ^BoxPredictor_4/BoxEncodingPredictor/biases/read, ^BoxPredictor_5/BoxEncodingPredictor/biases/read, ^const6)}} with dst_input 0 and had pre-existing input edge {name:'TRTEngineOp_1' id:1323 op device:{} def:{{{node TRTEngineOp_1}} = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,300,300,3]], max_cached_engines_count=10, output_shapes=[[1,576,19,19], [1,1280,10,10], [1,512,5,5], [1,256,3,3], [1,24,3,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_1_native_segment", serialized_segment="05...00[=12=]0[=12=]0", static_engine=true, use_calibration=false, workspace_size_bytes=11966231, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/stack, ^const6)}}:{name:'TRTEngineOp_0' id:1322 op device:{} def:{{{node TRTEngineOp_0}} = TRTEngineOp[InT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,256,3,3], [1,512,5,5], [1,1280,10,10], [1,576,19,19], [1,24,3,3]], max_cached_engines_count=10, output_shapes=[[1,1917,4], [1,1917,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_0_native_segment", serialized_segment="0o1\...00[=12=]0[=12=]0", static_engine=true, use_calibration=false, workspace_size_bytes=4810985, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/Relu6, FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/Relu6, FeatureExtractor/MobilenetV2/Conv_1/Relu6, FeatureExtractor/MobilenetV2/expanded_conv_13/expansion_output, BoxPredictor_3/BoxEncodingPredictor/BiasAdd, ^Postprocessor/scale_logits/y, ^BoxPredictor_4/BoxEncodingPredictor/biases/read, ^BoxPredictor_5/BoxEncodingPredictor/biases/read, ^const6)}}
Aborted (core dumped)





  [1]: https://github.com/NVIDIA-AI-IOT/tf_trt_models

你能用这个参数 is_dynamic_op=True 重试调用 create_inference_graph 吗?

此外,使用 TF_CPP_VMODULE=convert_graph=2,convert_nodes=2,segment=2,trt_engine=2 python ...

增加 tensorflow 日志的冗长性会很好

还要检查最新的tensorflow。您可以从 dockerhub 试用夜间容器。