使用实例键进行训练和预测

Question

我能够训练我的模型并使用 ML 引擎进行预测，但我的结果不包含任何识别信息。这在一次提交一行进行预测时效果很好，但在提交多行时我无法将预测连接回原始输入数据。 GCP documentation 讨论了使用实例键，但我找不到任何使用实例键进行训练和预测的示例代码。以 GCP 人口普查为例，我将如何更新输入函数以通过图形传递唯一 ID 并在训练期间忽略它，但 return 具有预测的唯一 ID？或者，如果有人知道另一个例子已经在使用键，那也会有所帮助。

来自Census Estimator Sample

def serving_input_fn():
    feature_placeholders = {
      column.name: tf.placeholder(column.dtype, [None])
      for column in INPUT_COLUMNS
    }

    features = {
      key: tf.expand_dims(tensor, -1)
      for key, tensor in feature_placeholders.items()
    }

    return input_fn_utils.InputFnOps(
      features,
      None,
      feature_placeholders
    )


def generate_input_fn(filenames,
                  num_epochs=None,
                  shuffle=True,
                  skip_header_lines=0,
                  batch_size=40):

    def _input_fn():
        files = tf.concat([
          tf.train.match_filenames_once(filename)
          for filename in filenames
        ], axis=0)

        filename_queue = tf.train.string_input_producer(
          files, num_epochs=num_epochs, shuffle=shuffle)
        reader = tf.TextLineReader(skip_header_lines=skip_header_lines)

        _, rows = reader.read_up_to(filename_queue, num_records=batch_size)

        row_columns = tf.expand_dims(rows, -1)
        columns = tf.decode_csv(row_columns, record_defaults=CSV_COLUMN_DEFAULTS)
        features = dict(zip(CSV_COLUMNS, columns))

        # Remove unused columns
        for col in UNUSED_COLUMNS:
          features.pop(col)

        if shuffle:
           features = tf.train.shuffle_batch(
             features,
             batch_size,
             capacity=batch_size * 10,
             min_after_dequeue=batch_size*2 + 1,
             num_threads=multiprocessing.cpu_count(),
             enqueue_many=True,
             allow_smaller_final_batch=True
           )
        label_tensor = parse_label_column(features.pop(LABEL_COLUMN))
        return features, label_tensor

    return _input_fn

更新： 我能够使用 I just needed to alter it slightly to update the output alternatives in the model_fn_ops instead of just the prediction dict. However, this only works if my serving input function is coded for json inputs similar to this. My serving input function was previously modeled after the CSV serving input function in the Census Core Sample 中的建议代码。

我认为我的问题来自使用了 build_standardized_signature_def function and even more so the is_classification_problem function that it calls. The input dict length using the csv serving function is 1 so this logic ends up using the classification_signature_def which only ends up displaying the scores (which turns out are actually the probabilities) whereas the input dict length is greater than 1 with the json serving input function and instead the predict_signature_def，其中包括所有输出。

Answer 1

好问题。云机器学习引擎 flowers sample does this, by using the tf.identity operation to pass a string straight through from input to output. Here are the relevant lines during graph construction.

keys_placeholder = tf.placeholder(tf.string, shape=[None])
inputs = {
    'key': keys_placeholder,
    'image_bytes': tensors.input_jpeg
}

# To extract the id, we need to add the identity function.
keys = tf.identity(keys_placeholder)
outputs = {
   'key': keys,
   'prediction': tensors.predictions[0],
   'scores': tensors.predictions[1]
}

对于批量预测，您需要将 "key": "some_key_value" 插入您的实例记录中。对于在线预测，您可以 query 上面的图表和 JSON 请求，例如：

{'instances' : [
    {'key': 'first_key', 'image_bytes' : {'b64': ...}}, 
    {'key': 'second_key', 'image_bytes': {'b64': ...}}
    ]
}

Answer 2

更新：在 1.3 版中，contrib 估计器（例如 tf.contrib.learn.DNNClassifier）更改为继承自核心估计器 class tf.estimator.Estimator，与它的前身不同，它隐藏了模型函数作为私人 class 成员，因此您需要将下面解决方案中的 estimator.model_fn 替换为 estimator._model_fn。

Josh 的回答将您指向 Flowers 示例，如果您想使用自定义估算器，这是一个很好的解决方案。如果您想坚持使用罐装估算器（例如 tf.contrib.learn.DNNClassifiers），您可以将其包装在一个自定义估算器中，以增加对键的支持。（注意：我认为罐装估算器在进入核心时可能会获得关键支持）。

KEY = 'key'
def key_model_fn_gen(estimator):
    def _model_fn(features, labels, mode, params):
        key = features.pop(KEY, None)
        model_fn_ops = estimator.model_fn(
           features=features, labels=labels, mode=mode, params=params)
        if key:
            model_fn_ops.predictions[KEY] = key
            # This line makes it so the exported SavedModel will also require a key
            model_fn_ops.output_alternatives[None][1][KEY] = key
        return model_fn_ops
    return _model_fn

my_key_estimator = tf.contrib.learn.Estimator(
    model_fn=key_model_fn_gen(
        tf.contrib.learn.DNNClassifier(model_dir=model_dir...)
    ),
    model_dir=model_dir
)

my_key_estimator 然后可以像使用 DNNClassifier 一样使用，除了它需要一个来自 input_fns 的名称为 'key' 的特征（预测，评估和培训）。

编辑2：您还需要将相应的输入张量添加到您选择的预测输入函数中。例如，一个新的 JSON 服务输入 fn 看起来像：

def json_serving_input_fn():
  inputs = # ... input_dict as before
  inputs[KEY] = tf.placeholder([None], dtype=tf.int64)
  features = # .. feature dict made from input_dict as before
  tf.contrib.learn.InputFnOps(features, None, inputs)

（1.2 和 1.3 之间略有不同，因为 tf.contrib.learn.InputFnOps 被替换为 tf.estimator.export.ServingInputReceiver，并且在 1.3 中不再需要将张量填充到等级 2）

然后 ML 引擎将发送一个名为 "key" 的张量和您的预测请求，该张量将传递给您的模型，并通过您的预测。

EDIT3：修改 key_model_fn_gen 以支持忽略缺失的键值。 EDIT4：添加预测键

使用实例键进行训练和预测

Training and Predicting with instance keys

tensorflow

google-cloud-ml-engine