如何让 Tensorflow Served 模型从传入的输入而不是本地批处理文件中提取？

Question

我目前正在尝试让 seq2seq 模型与 TF 服务一起使用。我以为我做对了，但似乎我错了。我最初通过本地文本文件输入训练模型，分批读取。现在我想输入一个句子，它 return 返回给我总和。

我已成功保存和提供模型，现在我可以在我的前端页面上查看预测，但是结果仍然来自我的本地文本文件，而不是我传入的查询参数语句.

我的输入是当前作为查询参数发送的一个句子，但实际显示的结果仍然是从我的文本文件中提取的，即使我将 batch_x 映射到我的 arg[1] 的值，我已验证是正确的预期输入。

有人看到我做错了什么吗？显然我误解了我应该采取的过程。

这里要注意的重要一点是，如果我修改传入的参数值并直接调用 python 文件，我会得到正确的结果。但是，当我对正在提供服务的冻结模型进行相同的调用时，无论发送什么内容，我总是会得到相同的预测响应。

这就是我冻结模型的方式（注意 inputs_dict.X 到 batch_x 的映射...相信问题是我在这里做错了）：

pickle_fn = 'args.pickle'
folder = os.path.dirname(os.path.abspath(__file__)) + '/pickle'
pickle_filepath = os.path.join(folder, pickle_fn)
with open(pickle_filepath, "rb") as f:
    args = pickle.load(f)

print("Loading dictionary...")
word_dict, reversed_dict, article_max_len, summary_max_len = build_dict("valid", args.toy)
print("Loading validation dataset...")

#The below call will pull from the arg passed when "serve" is used
valid_x, valid_y = build_dataset("serve", word_dict, article_max_len, summary_max_len, args.toy)
valid_x_len = list(map(lambda x: len([y for y in x if y != 0]), valid_x))

with tf.Session() as sess:
    print("Loading saved model...")
    model = Model(reversed_dict, article_max_len, summary_max_len, args, forward_only=True)
    saver = tf.train.Saver(tf.global_variables())
    ckpt = tf.train.get_checkpoint_state("./saved_model/")
    saver.restore(sess, ckpt.model_checkpoint_path)

    batches = batch_iter(valid_x, valid_y, args.batch_size, 1)
    #print(valid_x, file=open("art_working_inp.txt", "a"))
    print("Writing summaries to 'result.txt'...")
    for batch_x, batch_y in batches:
        batch_x_len = list(map(lambda x: len([y for y in x if y != 0]), batch_x))

        valid_feed_dict = {
            model.batch_size: len(batch_x),
            model.X: batch_x,
            model.X_len: batch_x_len,
        }

        prediction = sess.run(model.prediction, feed_dict=valid_feed_dict)
        prediction_output = list(map(lambda x: [reversed_dict[y] for y in x], prediction[:, 0, :]))

        #Save out our model
        cwd = os.getcwd()
        path = os.path.join(cwd, 'simple')

        inputs_dict = {
            "X": tf.convert_to_tensor(batch_x)
        }
        outputs_dict = {
            "prediction": tf.convert_to_tensor(prediction_output)
        }

        tf.saved_model.simple_save(
            sess, path, inputs_dict, outputs_dict
        )
        print('Model Saved')
        #End save model code

        #Save results to file
        with open("result.txt", "a") as f:
            for line in prediction_output:
                summary = list()
                for word in line:
                    if word == "</s>":
                        break
                    if word not in summary:
                        summary.append(word)
                print(" ".join(summary), file=f)

    print('Summaries are saved to "result.txt"...')

那我调用服务器推理就到这里了。无论我输入什么数据，它总是会吐出与我最初在导出模型时传入的相同的预测。

def do_inference(hostport):
  """Tests PredictionService with concurrent requests.
  Args:
    hostport: Host:port address of the PredictionService.
  Returns:
    pred values, ground truth labels, processing time 
  """
  # connect to server
  host, port = hostport.split(':')
  channel = grpc.insecure_channel(hostport)
  stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

  # prepare request object 
  request = predict_pb2.PredictRequest()
  request.model_spec.name = 'saved_model'

  # Get the input data from our arg
  jsn_inp = sys.argv[1] 
  data = json.loads(jsn_inp)['tokenized']
  data = np.array(data)

  request.inputs['X'].CopyFrom(
      tf.contrib.util.make_tensor_proto(data, shape=data.shape, dtype=tf.int64))

  #print(request)
  result = stub.Predict(request, 10.0)  # 10 seconds

  return result

如果这有用，这就是它构建数据集的方式。我修改了 build_dataset 函数，因此它只使用传入的 arg，但这也没有解决问题。我认为可能发生了类似于 javascript 闭包之类的事情，所以我想我会以这种方式提取数据。

def build_dataset(step, word_dict, article_max_len, summary_max_len, toy=False):
    if step == "train":
        article_list = get_text_list(train_article_path, toy)
        title_list = get_text_list(train_title_path, toy)
    elif step == "valid":
        article_list = get_text_list(valid_article_path, toy)
        title_list = get_text_list(valid_title_path, toy)
    elif step == "serve":
        arg_to_use = sys.argv[1] if ("tokenized" in sys.argv[1]) else sys.argv[2]
        article_list = [json.loads(arg_to_use)["tokenized"]]
    else:
        raise NotImplementedError
    if step != "serve":
        x = list(map(lambda d: word_tokenize(d), article_list))
        x = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), x))
        x = list(map(lambda d: d[:article_max_len], x))
        x = list(map(lambda d: d + (article_max_len - len(d)) * [word_dict["<padding>"]], x))
        print(x, file=open("input_values.txt", "a"))
        y = list(map(lambda d: word_tokenize(d), title_list))
        y = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), y))
        y = list(map(lambda d: d[:(summary_max_len-1)], y))
    else:
        x = article_list
        #x = list(map(lambda d: word_tokenize(d), article_list))
        #x = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), x))
        x = list(map(lambda d: d[:article_max_len], x))
        x = list(map(lambda d: d + (article_max_len - len(d)) * [word_dict["<padding>"]], x))
        y = list()

    return x, y

SignatureDef 信息（让我有点担心的是下面的 Const...但不确定是什么...现在正在查看）：

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['X'] tensor_info:
        dtype: DT_INT64
        shape: (1, 50)
        name: Const:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['prediction'] tensor_info:
        dtype: DT_STRING
        shape: (1, 11)
        name: Const_1:0
  Method name is: tensorflow/serving/predict

Answer 1

好的....所以看起来 const 问题是我的问题，或者更确切地说是指导我找到真正的问题是什么。我的问题的真正根源是我传递给 tf.convert_to_tensor 我的价值观而不是 tf.placeholders 本身。因此，通过在保存模型时修改以下条目的逻辑，我能够在发送输入时得到正确的响应。如您所见，我还必须输入其他原始 batch_size 和 x_len 还有。希望其他人觉得这有帮助。

inputs_dict = {
            "batch_size": tf.convert_to_tensor(model.batch_size),
            "X": tf.convert_to_tensor(model.X),
            "X_len": tf.convert_to_tensor(model.X_len),
        }
        outputs_dict = {
            "prediction": tf.convert_to_tensor(model.prediction)
        }

这产生了更好看的 SignatureDef：

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['X'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 50)
        name: Placeholder:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['prediction'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 10, -1)
        name: decoder/decoder/transpose_1:0
  Method name is: tensorflow/serving/predict

如何让 Tensorflow Served 模型从传入的输入而不是本地批处理文件中提取？

How to get Tensorflow Served model to pull from passed in input and not local batch file?

python-3.x

tensorflow

tensorflow-serving