在 BERT 模型上调试 TensorFlow 服务

Question

我能够按照此示例使用 BERT 嵌入部署 NLP 模型（在 CPU 和 tensorflow-model-server 上使用 TF 1.14.0）： https://mc.ai/how-to-ship-machine-learning-models-into-production-with-tensorflow-serving-and-kubernetes/

模型描述很干净：

!saved_model_cli show --dir {'tf_bert_model/1'} --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['Input-Segment:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 64)
        name: Input-Segment:0
    inputs['Input-Token:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 64)
        name: Input-Token:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense/Softmax:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: dense/Softmax:0
  Method name is: tensorflow/serving/predict

服务模型的数据输入格式是字典列表：

data
'{"instances": [{"Input-Token:0": [101, 101, 1962, 7770, 1069, 102, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "Input-Segment:0": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}]}'

r = requests.post("http://127.0.0.1:8501/v1/models/tf_bert_model:predict",
 json=data)

我现在正在尝试使用 TF2.1、HuggingFace 转换器库和 GPU 部署 BERT 模型，但部署的模型返回 400 错误或 200 错误，我不知道如何调试它.我怀疑这可能是数据输入格式问题。

我的模型描述比较乱：

2020-03-20 14:47:03.465762: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2020-03-20 14:47:03.465883: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2020-03-20 14:47:03.465900: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['attention_mask'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 128)
        name: serving_default_attention_mask:0
    inputs['input_ids'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 128)
        name: serving_default_input_ids:0
    inputs['labels'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 1)
        name: serving_default_labels:0
    inputs['token_type_ids'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 128)
        name: serving_default_token_type_ids:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #2
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #3
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #4
      Callable with:
        Argument #1
          DType: dict
          Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #2
      Callable with:
        Argument #1
          DType: dict
          Value: {'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #3
      Callable with:
        Argument #1
          DType: dict
          Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #4
      Callable with:
        Argument #1
          DType: dict
          Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']

我也将我的数据输入格式化为字典列表：

data = {"instances": test_deploy_inputs2}
data
{'instances': [{'attention_mask': [1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0],
   'input_ids': [101,
    1999,
    5688,
    1010,
    12328,
    5845,
    2007,
    5423,
    3593,
    28991,
    19362,
    4588,
    4244,
    4820,
    12553,
    12987,
    10737,
    2008,
    23150,
    14719,
    1011,
    20802,
    3662,
    2896,
    3798,
    1997,
    17953,
    14536,
    2509,
    1998,
    6335,
    1011,
    1015,
    29720,
    1998,
    2020,
    11914,
    5123,
    2013,
    6388,
    2135,
    10572,
    27441,
    7315,
    1012,
    102,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0],
   'labels': 0,
   'token_type_ids': [0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0]}]}

并且在测试已部署模型时出现 200 错误：

r = requests.post("http://127.0.0.1:8501/v1/models/fashion_model:predict",
 json=data)
r
<Response [200]>

知道如何调试吗？谢谢

Answer 1

我的错！响应 [200] 并不意味着它不起作用，您可以通过

查看结果

predictions = json.loads(json_response.text)['predictions']
predictions

在 BERT 模型上调试 TensorFlow 服务

Debugging TensorFlow serving on BERT model

tensorflow-serving

tensorflow2.0