使用 tf Estimator 和 export_savedmodel 函数导出模型
Export a model with tf Estimator and export_savedmodel function
我正在基于 this tuorial. When I'm trying to save the model with tf.estimator export_savemodel 使用 Tensorflow 进行深度神经网络回归器我收到以下错误:
raise ValueError('Feature {} is not in features dictionary.'.format(key))
ValueError: Feature ad_provider is not in features dictionary.
我需要导出它以便在 Google Cloud Platform 中部署模型以支持预测。
这是我定义列的位置:
CSV_COLUMNS = [
"ad_provider", "device", "split_group","gold", "secret_areas",
"scored_enemies", "tutorial_sec", "video_success"
]
FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas",
"scored_enemies", "tutorial_sec"]
LABEL = "video_success"
ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(
"ad_provider", ["Organic","Apple Search Ads","googleadwords_int",
"Facebook Ads","website"] )
split_group = tf.feature_column.categorical_column_with_vocabulary_list(
"split_group", [1,2,3,4])
device = tf.feature_column.categorical_column_with_hash_bucket(
"device", hash_bucket_size=100)
secret_areas = tf.feature_column.numeric_column("secret_areas")
gold = tf.feature_column.numeric_column("gold")
scored_enemies = tf.feature_column.numeric_column("scored_enemies")
finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")
video_success = tf.feature_column.numeric_column("video_success")
feature_columns = [
tf.feature_column.indicator_column(ad_provider),
tf.feature_column.embedding_column(device, dimension=8),
tf.feature_column.indicator_column(split_group),
tf.feature_column.numeric_column(key="gold"),
tf.feature_column.numeric_column(key="secret_areas"),
tf.feature_column.numeric_column(key="scored_enemies"),
tf.feature_column.numeric_column(key="tutorial_sec"),
]
之后,我创建了一个函数 以在 JSON 字典 中导出我的模型。我不确定我是否做好了服务功能。
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in feature_columns:
inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if
hasattr(feat, 'dtype') else tf.string)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.items()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
这是我的其余代码:
def main(unused_argv):
#Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set
train_n = training_set
train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min())
train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min())
train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min())
test_n = test_set
test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min())
test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min())
test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min())
train_input_fn = tf.estimator.inputs.pandas_input_fn(
x=train_n,
y=pd.Series(train_n[LABEL].values),
batch_size=100,
num_epochs=None,
shuffle=True)
test_input_fn = tf.estimator.inputs.pandas_input_fn(
x=test_n,
y=pd.Series(test_n[LABEL].values),
batch_size=100,
num_epochs=1,
shuffle=False)
regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="model1",
optimizer='RMSProp'
)
# Train
regressor.train(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)
#Evaluate loss over one epoch of test_set.
#For each step, calls `input_fn`, which returns one batch of data.
ev = regressor.evaluate(
input_fn=test_input_fn)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
for key in sorted(ev):
print("%s: %s" % (key, ev[key]))
# Print out predictions over a slice of prediction_set.
y = regressor.predict(
input_fn=test_input_fn)
# Array with prediction list!
predictions = list(p["predictions"] for p in y)
#real = list(p["real"] for p in pd.Series(training_set[LABEL].values))
real = test_set[LABEL].values
diff = np.subtract(real,predictions)
diff = np.absolute(diff)
diff = np.mean(diff)
print("Mean Square Error of Test Set = ",diff*diff)
除了您提到的问题之外,我还预见到您 运行 实际存在多个其他问题:
- 您正在使用 TensorFlow 1.3 中引入的
tf.estimator.DnnRegressor
。 CloudML Engine 仅正式支持 TF 1.2.
- 您正在对 panda 数据帧中的特征进行规范化,这不会在服务时发生(除非您在客户端进行)。这会引入偏斜,你会得到糟糕的预测结果。
所以让我们从使用 tf.contrib.learn.DNNRegressor
开始,它只需要稍作改动:
regressor = tf.estimator.DNNRegressor(
feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="model1",
optimizer='RMSProp'
)
regressor.fit(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)
注意 fit
而不是 train
。
(注意: 你的 json_serving_inputfn
实际上已经为 TF 1.2 编写并且与 TF 1.3 不兼容。现在这很好)。
现在,您看到的错误的根本原因是 column/features ad_provider
不在输入和功能列表中(但您确实有 ad_provider_indicator
) .这是因为您正在遍历 feature_columns
而不是原始输入列列表。解决这个问题的方法是迭代实际输入而不是特征列;但是,我们也需要知道类型(仅用几列进行了简化):
CSV_COLUMNS = ["ad_provider", "gold", "video_success"]
FEATURES = ["ad_provider", "gold"]
TYPES = [tf.string, tf.float32]
LABEL = "video_success"
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat, dtype in zip(FEATURES, TYPES):
inputs[feat] = tf.placeholder(shape=[None], dtype=dtype)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.items()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
最后,要规范化您的数据,您可能需要在图表中这样做。您可以尝试使用 tf.transform
,或者编写一个执行转换的自定义估算器,委托实际模型实现 DNNRegressor。
我正在基于 this tuorial. When I'm trying to save the model with tf.estimator export_savemodel 使用 Tensorflow 进行深度神经网络回归器我收到以下错误:
raise ValueError('Feature {} is not in features dictionary.'.format(key))
ValueError: Feature ad_provider is not in features dictionary.
我需要导出它以便在 Google Cloud Platform 中部署模型以支持预测。
这是我定义列的位置:
CSV_COLUMNS = [
"ad_provider", "device", "split_group","gold", "secret_areas",
"scored_enemies", "tutorial_sec", "video_success"
]
FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas",
"scored_enemies", "tutorial_sec"]
LABEL = "video_success"
ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(
"ad_provider", ["Organic","Apple Search Ads","googleadwords_int",
"Facebook Ads","website"] )
split_group = tf.feature_column.categorical_column_with_vocabulary_list(
"split_group", [1,2,3,4])
device = tf.feature_column.categorical_column_with_hash_bucket(
"device", hash_bucket_size=100)
secret_areas = tf.feature_column.numeric_column("secret_areas")
gold = tf.feature_column.numeric_column("gold")
scored_enemies = tf.feature_column.numeric_column("scored_enemies")
finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")
video_success = tf.feature_column.numeric_column("video_success")
feature_columns = [
tf.feature_column.indicator_column(ad_provider),
tf.feature_column.embedding_column(device, dimension=8),
tf.feature_column.indicator_column(split_group),
tf.feature_column.numeric_column(key="gold"),
tf.feature_column.numeric_column(key="secret_areas"),
tf.feature_column.numeric_column(key="scored_enemies"),
tf.feature_column.numeric_column(key="tutorial_sec"),
]
之后,我创建了一个函数 以在 JSON 字典 中导出我的模型。我不确定我是否做好了服务功能。
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in feature_columns:
inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if
hasattr(feat, 'dtype') else tf.string)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.items()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
这是我的其余代码:
def main(unused_argv):
#Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set
train_n = training_set
train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min())
train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min())
train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min())
test_n = test_set
test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min())
test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min())
test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min())
train_input_fn = tf.estimator.inputs.pandas_input_fn(
x=train_n,
y=pd.Series(train_n[LABEL].values),
batch_size=100,
num_epochs=None,
shuffle=True)
test_input_fn = tf.estimator.inputs.pandas_input_fn(
x=test_n,
y=pd.Series(test_n[LABEL].values),
batch_size=100,
num_epochs=1,
shuffle=False)
regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="model1",
optimizer='RMSProp'
)
# Train
regressor.train(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)
#Evaluate loss over one epoch of test_set.
#For each step, calls `input_fn`, which returns one batch of data.
ev = regressor.evaluate(
input_fn=test_input_fn)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
for key in sorted(ev):
print("%s: %s" % (key, ev[key]))
# Print out predictions over a slice of prediction_set.
y = regressor.predict(
input_fn=test_input_fn)
# Array with prediction list!
predictions = list(p["predictions"] for p in y)
#real = list(p["real"] for p in pd.Series(training_set[LABEL].values))
real = test_set[LABEL].values
diff = np.subtract(real,predictions)
diff = np.absolute(diff)
diff = np.mean(diff)
print("Mean Square Error of Test Set = ",diff*diff)
除了您提到的问题之外,我还预见到您 运行 实际存在多个其他问题:
- 您正在使用 TensorFlow 1.3 中引入的
tf.estimator.DnnRegressor
。 CloudML Engine 仅正式支持 TF 1.2. - 您正在对 panda 数据帧中的特征进行规范化,这不会在服务时发生(除非您在客户端进行)。这会引入偏斜,你会得到糟糕的预测结果。
所以让我们从使用 tf.contrib.learn.DNNRegressor
开始,它只需要稍作改动:
regressor = tf.estimator.DNNRegressor(
feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="model1",
optimizer='RMSProp'
)
regressor.fit(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)
注意 fit
而不是 train
。
(注意: 你的 json_serving_inputfn
实际上已经为 TF 1.2 编写并且与 TF 1.3 不兼容。现在这很好)。
现在,您看到的错误的根本原因是 column/features ad_provider
不在输入和功能列表中(但您确实有 ad_provider_indicator
) .这是因为您正在遍历 feature_columns
而不是原始输入列列表。解决这个问题的方法是迭代实际输入而不是特征列;但是,我们也需要知道类型(仅用几列进行了简化):
CSV_COLUMNS = ["ad_provider", "gold", "video_success"]
FEATURES = ["ad_provider", "gold"]
TYPES = [tf.string, tf.float32]
LABEL = "video_success"
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat, dtype in zip(FEATURES, TYPES):
inputs[feat] = tf.placeholder(shape=[None], dtype=dtype)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.items()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
最后,要规范化您的数据,您可能需要在图表中这样做。您可以尝试使用 tf.transform
,或者编写一个执行转换的自定义估算器,委托实际模型实现 DNNRegressor。