使用pyparsing将devicetree解析为结构化字典

Parsing devicetree with pyparsing into structured dictionary

对于我的 C++ RTOS,我正在使用 pyparsing 模块在 Python 中编写 devicetree "source" 文件 (.dts) 的解析器。我能够将 devicetree 的结构解析为一个(嵌套的)字典,其中 属性 名称或节点名称是字典键(字符串),而 属性 值或节点是字典值(要么字符串或嵌套字典)。

假设我有以下示例设备树结构:

/ {
    property1 = "string1";
    property2 = "string2";
    node1 {
        property11 = "string11";
        property12 = "string12";
        node11 {
            property111 = "string111";
            property112 = "string112";
        };
    };
    node2 {
        property21 = "string21";
        property22 = "string22";
    };
};

我可以将其解析为类似的内容:

{'/': {'node1': {'node11': {'property111': ['string111'], 'property112': ['string112']},
                 'property11': ['string11'],
                 'property12': ['string12']},
       'node2': {'property21': ['string21'], 'property22': ['string22']},
       'property1': ['string1'],
       'property2': ['string2']}}

然而,为了我的需要,我更愿意以不同的方式构建这些数据。我希望将所有属性作为键 "properties" 的嵌套字典,并将所有子节点作为键 "children" 的嵌套字典。原因是设备树(尤其是节点)有一些 "metadata" ,我希望它们作为键值对,这需要我将节点的实际 "contents" 移动一级 "lower" 以避免密钥的任何名称冲突。所以我希望上面的例子看起来像这样:

{'/': {
  'properties': {
    'property1': ['string1'],
    'property2': ['string2']
  },
  'nodes': {
    'node1': {
      'properties': {
        'property11': ['string11'],
        'property12': ['string12']
      }
      'nodes': {
        'node11': {
          'properties': {
            'property111': ['string111'],
            'property112': ['string112']
          }
          'nodes': {
          }
        }
      }
    },
    'node2': {
      'properties': {
        'property21': ['string21'],
        'property22': ['string22']
      }
      'nodes': {
      }
    }
  }
}
}

我试图将 "name" 添加到解析标记中,但这会导致 "doubled" 字典元素(这是预期的,因为 pyparsing 文档中描述了此行为)。这可能不是问题,但从技术上讲,节点或 属性 可以命名为 "properties" 或 "children"(或我选择的任何名称),所以我认为这样的解决方案并不可靠。

我也试过用setParseAction()把token转成字典片段(希望能把{'key': 'value'}转成{'properties': {'key': 'value'}}),但是不行完全...

这完全有可能直接使用 pyparsing 吗?我准备做第二阶段将原始字典转换为我需要的任何结构,但作为一个完美主义者,我更愿意使用单一 运行 仅 pyparsing 解决方案 - 如果可能的话。

作为参考,这里有一个示例代码 (Python 3),它将设备树源转换为 "unstructured" 字典。请注意,此代码只是一种简化,不支持 .dts 中的所有功能(字符串、值列表、单元地址、标签等以外的任何数据类型)——它仅支持字符串属性和节点嵌套。

#!/usr/bin/env python

import pyparsing
import pprint

nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + pyparsing.Group(pyparsing.Literal('=').suppress() +
        propertyValue) + pyparsing.Literal(';').suppress()))
childNode = pyparsing.Forward()
rootNode = pyparsing.Dict(pyparsing.Group(pyparsing.Literal('/') + pyparsing.Literal('{').suppress() +
        pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
        pyparsing.Literal('};').suppress()))
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + pyparsing.Literal('{').suppress() +
        pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
        pyparsing.Literal('};').suppress()))

dictionary = rootNode.parseString("""
/ {
    property1 = "string1";
    property2 = "string2";
    node1 {
        property11 = "string11";
        property12 = "string12";
        node11 {
            property111 = "string111";
            property112 = "string112";
        };
    };
    node2 {
        property21 = "string21";
        property22 = "string22";
    };
};
""").asDict()
pprint.pprint(dictionary, width = 120)

你们真的很亲近。我刚刚做了以下事情:

  • 为您的 "properties" 和 "nodes" 子部分添加了 Group 和结果名称
  • 将一些标点文字更改为常量(如果右大括号和分号之间有 space,Literal("};") 将无法匹配,但 RBRACE + SEMI 将容纳白色 space)
  • 删除了 rootNode
  • 上最外面的 Dict

代码:

LBRACE,RBRACE,SLASH,SEMI,EQ = map(pyparsing.Suppress, "{}/;=")
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + EQ 
                                          + pyparsing.Group(propertyValue)
                                          + SEMI))
childNode = pyparsing.Forward()
rootNode = pyparsing.Group(SLASH + LBRACE
                           + pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
                           + pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
                           + RBRACE + SEMI)
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + LBRACE
                                             + pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
                                             + pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
                                             + RBRACE + SEMI))

使用 asDict 转换为字典并使用 pprint 打印得到:

pprint.pprint(result[0].asDict())
{'children': {'node1': {'children': {'node11': {'children': [],
                                                'properties': {'property111': ['string111'],
                                                               'property112': ['string112']}}},
                        'properties': {'property11': ['string11'],
                                       'property12': ['string12']}},
              'node2': {'children': [],
                        'properties': {'property21': ['string21'],
                                       'property22': ['string22']}}},
 'properties': {'property1': ['string1'], 'property2': ['string2']}}

您还可以使用 pyparsing 的 ParseResults class 中包含的 dump() 方法来帮助可视化列表和 dict/namespace-style 按原样访问结果, 无需任何转换调用

print(result[0].dump())

[[['property1', ['string1']], ['property2', ['string2']]], [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]]
- children: [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]
  - node1: [[['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]]
    - children: [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]
      - node11: [[['property111', ['string111']], ['property112', ['string112']]], []]
        - children: []
        - properties: [['property111', ['string111']], ['property112', ['string112']]]
          - property111: ['string111']
          - property112: ['string112']
    - properties: [['property11', ['string11']], ['property12', ['string12']]]
      - property11: ['string11']
      - property12: ['string12']
  - node2: [[['property21', ['string21']], ['property22', ['string22']]], []]
    - children: []
    - properties: [['property21', ['string21']], ['property22', ['string22']]]
      - property21: ['string21']
      - property22: ['string22']
- properties: [['property1', ['string1']], ['property2', ['string2']]]
  - property1: ['string1']
  - property2: ['string2']