将 pipeline_pb2.TrainEvalPipelineConfig 转换为 JSON 或 YAML 文件以进行 tensorflow 对象检测 API

Question

我想将 pipeline_pb2.TrainEvalPipelineConfig 转换为 JSON 或 YAML 文件格式以进行 tensorflow 对象检测 API。我尝试使用 :

转换 protobuf 文件

import tensorflow as tf
from google.protobuf import text_format
import yaml

from object_detection.protos import pipeline_pb2

def get_configs_from_pipeline_file(pipeline_config_path, config_override=None):

  '''
  read .config and convert it to proto_buffer_object
  '''

  pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
  with tf.gfile.GFile(pipeline_config_path, "r") as f:
    proto_str = f.read()
    text_format.Merge(proto_str, pipeline_config)
  if config_override:
    text_format.Merge(config_override, pipeline_config)
  #print(pipeline_config)
  return pipeline_config


def create_configs_from_pipeline_proto(pipeline_config):
  '''
  Returns the configurations as dictionary
  '''

  configs = {}
  configs["model"] = pipeline_config.model
  configs["train_config"] = pipeline_config.train_config
  configs["train_input_config"] = pipeline_config.train_input_reader
  configs["eval_config"] = pipeline_config.eval_config
  configs["eval_input_configs"] = pipeline_config.eval_input_reader
  # Keeps eval_input_config only for backwards compatibility. All clients should
  # read eval_input_configs instead.
  if configs["eval_input_configs"]:
    configs["eval_input_config"] = configs["eval_input_configs"][0]
  if pipeline_config.HasField("graph_rewriter"):
    configs["graph_rewriter_config"] = pipeline_config.graph_rewriter

  return configs


configs = get_configs_from_pipeline_file('pipeline.config')
config_as_dict = create_configs_from_pipeline_proto(configs)

但是当我尝试使用 yaml.dump(config_as_dict) 将这个返回的字典转换为 YAML 时，它说

TypeError: can't pickle google.protobuf.pyext._message.RepeatedCompositeContainer objects

对于 json.dump(config_as_dict) 它说：

Traceback (most recent call last):
  File "config_file_parsing.py", line 48, in <module>
    config_as_json = json.dumps(config_as_dict)
  File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: label_map_path: "label_map.pbtxt"
shuffle: true
tf_record_input_reader {
  input_path: "dataset.record"
}
 is not JSON serializable

在此希望得到一些帮助。

Answer 1

JSON 只能转储 python primitivies 原语和 dict 和列表集合的子集（对自引用有限制）。

YAML 更强大，可用于转储任意 Python 对象。但前提是这些对象在转储的表示阶段可以是 "investigated"，这实际上将其限制为纯 Python classes 的实例。对于在 C 级别创建的对象，可以进行显式转储，如果不可用，Python 将尝试使用 pickle 协议将数据转储到 YAML。

在 PyPI 上检查 protobuf 告诉我有非通用的轮子可用，这总是一些 C 代码优化的指示。检查其中一个文件确实显示了一个预编译的共享对象。

虽然你从config中做了一个dict，但是这个dict当然只有在它的所有键和它的所有值都可以转储时才能转储。由于您的键是字符串（JSON 所必需的），您需要查看每个值，找到不转储的那个，并将其转换为可转储对象结构（dict/list JSON，纯 Python class 用于 YAML）。

您可能想看看模块 json_format

将 pipeline_pb2.TrainEvalPipelineConfig 转换为 JSON 或 YAML 文件以进行 tensorflow 对象检测 API

Convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file for tensorflow object detection API

json

dictionary

yaml

tensorflow

object-detection-api