如何让 Tensorflow Served 模型从传入的输入而不是本地批处理文件中提取?
How to get Tensorflow Served model to pull from passed in input and not local batch file?
我目前正在尝试让 seq2seq 模型与 TF 服务一起使用。我以为我做对了,但似乎我错了。我最初通过本地文本文件输入训练模型,分批读取。现在我想输入一个句子,它 return 返回给我总和。
我已成功保存和提供模型,现在我可以在我的前端页面上查看预测,但是结果仍然来自我的本地文本文件,而不是我传入的查询参数语句.
我的输入是当前作为查询参数发送的一个句子,但实际显示的结果仍然是从我的文本文件中提取的,即使我将 batch_x 映射到我的 arg[1] 的值,我已验证是正确的预期输入。
有人看到我做错了什么吗?显然我误解了我应该采取的过程。
这里要注意的重要一点是,如果我修改传入的参数值并直接调用 python 文件,我会得到正确的结果。但是,当我对正在提供服务的冻结模型进行相同的调用时,无论发送什么内容,我总是会得到相同的预测响应。
这就是我冻结模型的方式(注意 inputs_dict.X 到 batch_x 的映射...相信问题是我在这里做错了):
pickle_fn = 'args.pickle'
folder = os.path.dirname(os.path.abspath(__file__)) + '/pickle'
pickle_filepath = os.path.join(folder, pickle_fn)
with open(pickle_filepath, "rb") as f:
args = pickle.load(f)
print("Loading dictionary...")
word_dict, reversed_dict, article_max_len, summary_max_len = build_dict("valid", args.toy)
print("Loading validation dataset...")
#The below call will pull from the arg passed when "serve" is used
valid_x, valid_y = build_dataset("serve", word_dict, article_max_len, summary_max_len, args.toy)
valid_x_len = list(map(lambda x: len([y for y in x if y != 0]), valid_x))
with tf.Session() as sess:
print("Loading saved model...")
model = Model(reversed_dict, article_max_len, summary_max_len, args, forward_only=True)
saver = tf.train.Saver(tf.global_variables())
ckpt = tf.train.get_checkpoint_state("./saved_model/")
saver.restore(sess, ckpt.model_checkpoint_path)
batches = batch_iter(valid_x, valid_y, args.batch_size, 1)
#print(valid_x, file=open("art_working_inp.txt", "a"))
print("Writing summaries to 'result.txt'...")
for batch_x, batch_y in batches:
batch_x_len = list(map(lambda x: len([y for y in x if y != 0]), batch_x))
valid_feed_dict = {
model.batch_size: len(batch_x),
model.X: batch_x,
model.X_len: batch_x_len,
}
prediction = sess.run(model.prediction, feed_dict=valid_feed_dict)
prediction_output = list(map(lambda x: [reversed_dict[y] for y in x], prediction[:, 0, :]))
#Save out our model
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
inputs_dict = {
"X": tf.convert_to_tensor(batch_x)
}
outputs_dict = {
"prediction": tf.convert_to_tensor(prediction_output)
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Model Saved')
#End save model code
#Save results to file
with open("result.txt", "a") as f:
for line in prediction_output:
summary = list()
for word in line:
if word == "</s>":
break
if word not in summary:
summary.append(word)
print(" ".join(summary), file=f)
print('Summaries are saved to "result.txt"...')
那我调用服务器推理就到这里了。无论我输入什么数据,它总是会吐出与我最初在导出模型时传入的相同的预测。
def do_inference(hostport):
"""Tests PredictionService with concurrent requests.
Args:
hostport: Host:port address of the PredictionService.
Returns:
pred values, ground truth labels, processing time
"""
# connect to server
host, port = hostport.split(':')
channel = grpc.insecure_channel(hostport)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# prepare request object
request = predict_pb2.PredictRequest()
request.model_spec.name = 'saved_model'
# Get the input data from our arg
jsn_inp = sys.argv[1]
data = json.loads(jsn_inp)['tokenized']
data = np.array(data)
request.inputs['X'].CopyFrom(
tf.contrib.util.make_tensor_proto(data, shape=data.shape, dtype=tf.int64))
#print(request)
result = stub.Predict(request, 10.0) # 10 seconds
return result
如果这有用,这就是它构建数据集的方式。我修改了 build_dataset 函数,因此它只使用传入的 arg,但这也没有解决问题。我认为可能发生了类似于 javascript 闭包之类的事情,所以我想我会以这种方式提取数据。
def build_dataset(step, word_dict, article_max_len, summary_max_len, toy=False):
if step == "train":
article_list = get_text_list(train_article_path, toy)
title_list = get_text_list(train_title_path, toy)
elif step == "valid":
article_list = get_text_list(valid_article_path, toy)
title_list = get_text_list(valid_title_path, toy)
elif step == "serve":
arg_to_use = sys.argv[1] if ("tokenized" in sys.argv[1]) else sys.argv[2]
article_list = [json.loads(arg_to_use)["tokenized"]]
else:
raise NotImplementedError
if step != "serve":
x = list(map(lambda d: word_tokenize(d), article_list))
x = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), x))
x = list(map(lambda d: d[:article_max_len], x))
x = list(map(lambda d: d + (article_max_len - len(d)) * [word_dict["<padding>"]], x))
print(x, file=open("input_values.txt", "a"))
y = list(map(lambda d: word_tokenize(d), title_list))
y = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), y))
y = list(map(lambda d: d[:(summary_max_len-1)], y))
else:
x = article_list
#x = list(map(lambda d: word_tokenize(d), article_list))
#x = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), x))
x = list(map(lambda d: d[:article_max_len], x))
x = list(map(lambda d: d + (article_max_len - len(d)) * [word_dict["<padding>"]], x))
y = list()
return x, y
SignatureDef 信息(让我有点担心的是下面的 Const...但不确定是什么...现在正在查看):
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['X'] tensor_info:
dtype: DT_INT64
shape: (1, 50)
name: Const:0
The given SavedModel SignatureDef contains the following output(s):
outputs['prediction'] tensor_info:
dtype: DT_STRING
shape: (1, 11)
name: Const_1:0
Method name is: tensorflow/serving/predict
好的....所以看起来 const 问题是我的问题,或者更确切地说是指导我找到真正的问题是什么。我的问题的真正根源是我传递给 tf.convert_to_tensor 我的价值观而不是 tf.placeholders 本身。因此,通过在保存模型时修改以下条目的逻辑,我能够在发送输入时得到正确的响应。如您所见,我还必须输入其他原始 batch_size 和 x_len 还有。希望其他人觉得这有帮助。
inputs_dict = {
"batch_size": tf.convert_to_tensor(model.batch_size),
"X": tf.convert_to_tensor(model.X),
"X_len": tf.convert_to_tensor(model.X_len),
}
outputs_dict = {
"prediction": tf.convert_to_tensor(model.prediction)
}
这产生了更好看的 SignatureDef:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['X'] tensor_info:
dtype: DT_INT32
shape: (-1, 50)
name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
outputs['prediction'] tensor_info:
dtype: DT_INT32
shape: (-1, 10, -1)
name: decoder/decoder/transpose_1:0
Method name is: tensorflow/serving/predict
我目前正在尝试让 seq2seq 模型与 TF 服务一起使用。我以为我做对了,但似乎我错了。我最初通过本地文本文件输入训练模型,分批读取。现在我想输入一个句子,它 return 返回给我总和。
我已成功保存和提供模型,现在我可以在我的前端页面上查看预测,但是结果仍然来自我的本地文本文件,而不是我传入的查询参数语句.
我的输入是当前作为查询参数发送的一个句子,但实际显示的结果仍然是从我的文本文件中提取的,即使我将 batch_x 映射到我的 arg[1] 的值,我已验证是正确的预期输入。
有人看到我做错了什么吗?显然我误解了我应该采取的过程。
这里要注意的重要一点是,如果我修改传入的参数值并直接调用 python 文件,我会得到正确的结果。但是,当我对正在提供服务的冻结模型进行相同的调用时,无论发送什么内容,我总是会得到相同的预测响应。
这就是我冻结模型的方式(注意 inputs_dict.X 到 batch_x 的映射...相信问题是我在这里做错了):
pickle_fn = 'args.pickle'
folder = os.path.dirname(os.path.abspath(__file__)) + '/pickle'
pickle_filepath = os.path.join(folder, pickle_fn)
with open(pickle_filepath, "rb") as f:
args = pickle.load(f)
print("Loading dictionary...")
word_dict, reversed_dict, article_max_len, summary_max_len = build_dict("valid", args.toy)
print("Loading validation dataset...")
#The below call will pull from the arg passed when "serve" is used
valid_x, valid_y = build_dataset("serve", word_dict, article_max_len, summary_max_len, args.toy)
valid_x_len = list(map(lambda x: len([y for y in x if y != 0]), valid_x))
with tf.Session() as sess:
print("Loading saved model...")
model = Model(reversed_dict, article_max_len, summary_max_len, args, forward_only=True)
saver = tf.train.Saver(tf.global_variables())
ckpt = tf.train.get_checkpoint_state("./saved_model/")
saver.restore(sess, ckpt.model_checkpoint_path)
batches = batch_iter(valid_x, valid_y, args.batch_size, 1)
#print(valid_x, file=open("art_working_inp.txt", "a"))
print("Writing summaries to 'result.txt'...")
for batch_x, batch_y in batches:
batch_x_len = list(map(lambda x: len([y for y in x if y != 0]), batch_x))
valid_feed_dict = {
model.batch_size: len(batch_x),
model.X: batch_x,
model.X_len: batch_x_len,
}
prediction = sess.run(model.prediction, feed_dict=valid_feed_dict)
prediction_output = list(map(lambda x: [reversed_dict[y] for y in x], prediction[:, 0, :]))
#Save out our model
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
inputs_dict = {
"X": tf.convert_to_tensor(batch_x)
}
outputs_dict = {
"prediction": tf.convert_to_tensor(prediction_output)
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Model Saved')
#End save model code
#Save results to file
with open("result.txt", "a") as f:
for line in prediction_output:
summary = list()
for word in line:
if word == "</s>":
break
if word not in summary:
summary.append(word)
print(" ".join(summary), file=f)
print('Summaries are saved to "result.txt"...')
那我调用服务器推理就到这里了。无论我输入什么数据,它总是会吐出与我最初在导出模型时传入的相同的预测。
def do_inference(hostport):
"""Tests PredictionService with concurrent requests.
Args:
hostport: Host:port address of the PredictionService.
Returns:
pred values, ground truth labels, processing time
"""
# connect to server
host, port = hostport.split(':')
channel = grpc.insecure_channel(hostport)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# prepare request object
request = predict_pb2.PredictRequest()
request.model_spec.name = 'saved_model'
# Get the input data from our arg
jsn_inp = sys.argv[1]
data = json.loads(jsn_inp)['tokenized']
data = np.array(data)
request.inputs['X'].CopyFrom(
tf.contrib.util.make_tensor_proto(data, shape=data.shape, dtype=tf.int64))
#print(request)
result = stub.Predict(request, 10.0) # 10 seconds
return result
如果这有用,这就是它构建数据集的方式。我修改了 build_dataset 函数,因此它只使用传入的 arg,但这也没有解决问题。我认为可能发生了类似于 javascript 闭包之类的事情,所以我想我会以这种方式提取数据。
def build_dataset(step, word_dict, article_max_len, summary_max_len, toy=False):
if step == "train":
article_list = get_text_list(train_article_path, toy)
title_list = get_text_list(train_title_path, toy)
elif step == "valid":
article_list = get_text_list(valid_article_path, toy)
title_list = get_text_list(valid_title_path, toy)
elif step == "serve":
arg_to_use = sys.argv[1] if ("tokenized" in sys.argv[1]) else sys.argv[2]
article_list = [json.loads(arg_to_use)["tokenized"]]
else:
raise NotImplementedError
if step != "serve":
x = list(map(lambda d: word_tokenize(d), article_list))
x = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), x))
x = list(map(lambda d: d[:article_max_len], x))
x = list(map(lambda d: d + (article_max_len - len(d)) * [word_dict["<padding>"]], x))
print(x, file=open("input_values.txt", "a"))
y = list(map(lambda d: word_tokenize(d), title_list))
y = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), y))
y = list(map(lambda d: d[:(summary_max_len-1)], y))
else:
x = article_list
#x = list(map(lambda d: word_tokenize(d), article_list))
#x = list(map(lambda d: list(map(lambda w: word_dict.get(w, word_dict["<unk>"]), d)), x))
x = list(map(lambda d: d[:article_max_len], x))
x = list(map(lambda d: d + (article_max_len - len(d)) * [word_dict["<padding>"]], x))
y = list()
return x, y
SignatureDef 信息(让我有点担心的是下面的 Const...但不确定是什么...现在正在查看):
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['X'] tensor_info:
dtype: DT_INT64
shape: (1, 50)
name: Const:0
The given SavedModel SignatureDef contains the following output(s):
outputs['prediction'] tensor_info:
dtype: DT_STRING
shape: (1, 11)
name: Const_1:0
Method name is: tensorflow/serving/predict
好的....所以看起来 const 问题是我的问题,或者更确切地说是指导我找到真正的问题是什么。我的问题的真正根源是我传递给 tf.convert_to_tensor 我的价值观而不是 tf.placeholders 本身。因此,通过在保存模型时修改以下条目的逻辑,我能够在发送输入时得到正确的响应。如您所见,我还必须输入其他原始 batch_size 和 x_len 还有。希望其他人觉得这有帮助。
inputs_dict = {
"batch_size": tf.convert_to_tensor(model.batch_size),
"X": tf.convert_to_tensor(model.X),
"X_len": tf.convert_to_tensor(model.X_len),
}
outputs_dict = {
"prediction": tf.convert_to_tensor(model.prediction)
}
这产生了更好看的 SignatureDef:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['X'] tensor_info:
dtype: DT_INT32
shape: (-1, 50)
name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
outputs['prediction'] tensor_info:
dtype: DT_INT32
shape: (-1, 10, -1)
name: decoder/decoder/transpose_1:0
Method name is: tensorflow/serving/predict