使用pyparsing将devicetree解析为结构化字典
Parsing devicetree with pyparsing into structured dictionary
对于我的 C++ RTOS,我正在使用 pyparsing 模块在 Python 中编写 devicetree "source" 文件 (.dts
) 的解析器。我能够将 devicetree 的结构解析为一个(嵌套的)字典,其中 属性 名称或节点名称是字典键(字符串),而 属性 值或节点是字典值(要么字符串或嵌套字典)。
假设我有以下示例设备树结构:
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
我可以将其解析为类似的内容:
{'/': {'node1': {'node11': {'property111': ['string111'], 'property112': ['string112']},
'property11': ['string11'],
'property12': ['string12']},
'node2': {'property21': ['string21'], 'property22': ['string22']},
'property1': ['string1'],
'property2': ['string2']}}
然而,为了我的需要,我更愿意以不同的方式构建这些数据。我希望将所有属性作为键 "properties" 的嵌套字典,并将所有子节点作为键 "children" 的嵌套字典。原因是设备树(尤其是节点)有一些 "metadata" ,我希望它们作为键值对,这需要我将节点的实际 "contents" 移动一级 "lower" 以避免密钥的任何名称冲突。所以我希望上面的例子看起来像这样:
{'/': {
'properties': {
'property1': ['string1'],
'property2': ['string2']
},
'nodes': {
'node1': {
'properties': {
'property11': ['string11'],
'property12': ['string12']
}
'nodes': {
'node11': {
'properties': {
'property111': ['string111'],
'property112': ['string112']
}
'nodes': {
}
}
}
},
'node2': {
'properties': {
'property21': ['string21'],
'property22': ['string22']
}
'nodes': {
}
}
}
}
}
我试图将 "name" 添加到解析标记中,但这会导致 "doubled" 字典元素(这是预期的,因为 pyparsing 文档中描述了此行为)。这可能不是问题,但从技术上讲,节点或 属性 可以命名为 "properties" 或 "children"(或我选择的任何名称),所以我认为这样的解决方案并不可靠。
我也试过用setParseAction()
把token转成字典片段(希望能把{'key': 'value'}
转成{'properties': {'key': 'value'}}
),但是不行完全...
这完全有可能直接使用 pyparsing 吗?我准备做第二阶段将原始字典转换为我需要的任何结构,但作为一个完美主义者,我更愿意使用单一 运行 仅 pyparsing 解决方案 - 如果可能的话。
作为参考,这里有一个示例代码 (Python 3),它将设备树源转换为 "unstructured" 字典。请注意,此代码只是一种简化,不支持 .dts
中的所有功能(字符串、值列表、单元地址、标签等以外的任何数据类型)——它仅支持字符串属性和节点嵌套。
#!/usr/bin/env python
import pyparsing
import pprint
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + pyparsing.Group(pyparsing.Literal('=').suppress() +
propertyValue) + pyparsing.Literal(';').suppress()))
childNode = pyparsing.Forward()
rootNode = pyparsing.Dict(pyparsing.Group(pyparsing.Literal('/') + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
dictionary = rootNode.parseString("""
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
""").asDict()
pprint.pprint(dictionary, width = 120)
你们真的很亲近。我刚刚做了以下事情:
- 为您的 "properties" 和 "nodes" 子部分添加了
Group
和结果名称
- 将一些标点文字更改为常量(如果右大括号和分号之间有 space,
Literal("};")
将无法匹配,但 RBRACE + SEMI
将容纳白色 space)
- 删除了
rootNode
上最外面的 Dict
代码:
LBRACE,RBRACE,SLASH,SEMI,EQ = map(pyparsing.Suppress, "{}/;=")
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + EQ
+ pyparsing.Group(propertyValue)
+ SEMI))
childNode = pyparsing.Forward()
rootNode = pyparsing.Group(SLASH + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI)
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI))
使用 asDict 转换为字典并使用 pprint 打印得到:
pprint.pprint(result[0].asDict())
{'children': {'node1': {'children': {'node11': {'children': [],
'properties': {'property111': ['string111'],
'property112': ['string112']}}},
'properties': {'property11': ['string11'],
'property12': ['string12']}},
'node2': {'children': [],
'properties': {'property21': ['string21'],
'property22': ['string22']}}},
'properties': {'property1': ['string1'], 'property2': ['string2']}}
您还可以使用 pyparsing 的 ParseResults
class 中包含的 dump()
方法来帮助可视化列表和 dict/namespace-style 按原样访问结果, 无需任何转换调用
print(result[0].dump())
[[['property1', ['string1']], ['property2', ['string2']]], [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]]
- children: [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]
- node1: [[['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]]
- children: [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]
- node11: [[['property111', ['string111']], ['property112', ['string112']]], []]
- children: []
- properties: [['property111', ['string111']], ['property112', ['string112']]]
- property111: ['string111']
- property112: ['string112']
- properties: [['property11', ['string11']], ['property12', ['string12']]]
- property11: ['string11']
- property12: ['string12']
- node2: [[['property21', ['string21']], ['property22', ['string22']]], []]
- children: []
- properties: [['property21', ['string21']], ['property22', ['string22']]]
- property21: ['string21']
- property22: ['string22']
- properties: [['property1', ['string1']], ['property2', ['string2']]]
- property1: ['string1']
- property2: ['string2']
对于我的 C++ RTOS,我正在使用 pyparsing 模块在 Python 中编写 devicetree "source" 文件 (.dts
) 的解析器。我能够将 devicetree 的结构解析为一个(嵌套的)字典,其中 属性 名称或节点名称是字典键(字符串),而 属性 值或节点是字典值(要么字符串或嵌套字典)。
假设我有以下示例设备树结构:
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
我可以将其解析为类似的内容:
{'/': {'node1': {'node11': {'property111': ['string111'], 'property112': ['string112']},
'property11': ['string11'],
'property12': ['string12']},
'node2': {'property21': ['string21'], 'property22': ['string22']},
'property1': ['string1'],
'property2': ['string2']}}
然而,为了我的需要,我更愿意以不同的方式构建这些数据。我希望将所有属性作为键 "properties" 的嵌套字典,并将所有子节点作为键 "children" 的嵌套字典。原因是设备树(尤其是节点)有一些 "metadata" ,我希望它们作为键值对,这需要我将节点的实际 "contents" 移动一级 "lower" 以避免密钥的任何名称冲突。所以我希望上面的例子看起来像这样:
{'/': {
'properties': {
'property1': ['string1'],
'property2': ['string2']
},
'nodes': {
'node1': {
'properties': {
'property11': ['string11'],
'property12': ['string12']
}
'nodes': {
'node11': {
'properties': {
'property111': ['string111'],
'property112': ['string112']
}
'nodes': {
}
}
}
},
'node2': {
'properties': {
'property21': ['string21'],
'property22': ['string22']
}
'nodes': {
}
}
}
}
}
我试图将 "name" 添加到解析标记中,但这会导致 "doubled" 字典元素(这是预期的,因为 pyparsing 文档中描述了此行为)。这可能不是问题,但从技术上讲,节点或 属性 可以命名为 "properties" 或 "children"(或我选择的任何名称),所以我认为这样的解决方案并不可靠。
我也试过用setParseAction()
把token转成字典片段(希望能把{'key': 'value'}
转成{'properties': {'key': 'value'}}
),但是不行完全...
这完全有可能直接使用 pyparsing 吗?我准备做第二阶段将原始字典转换为我需要的任何结构,但作为一个完美主义者,我更愿意使用单一 运行 仅 pyparsing 解决方案 - 如果可能的话。
作为参考,这里有一个示例代码 (Python 3),它将设备树源转换为 "unstructured" 字典。请注意,此代码只是一种简化,不支持 .dts
中的所有功能(字符串、值列表、单元地址、标签等以外的任何数据类型)——它仅支持字符串属性和节点嵌套。
#!/usr/bin/env python
import pyparsing
import pprint
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + pyparsing.Group(pyparsing.Literal('=').suppress() +
propertyValue) + pyparsing.Literal(';').suppress()))
childNode = pyparsing.Forward()
rootNode = pyparsing.Dict(pyparsing.Group(pyparsing.Literal('/') + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
dictionary = rootNode.parseString("""
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
""").asDict()
pprint.pprint(dictionary, width = 120)
你们真的很亲近。我刚刚做了以下事情:
- 为您的 "properties" 和 "nodes" 子部分添加了
Group
和结果名称 - 将一些标点文字更改为常量(如果右大括号和分号之间有 space,
Literal("};")
将无法匹配,但RBRACE + SEMI
将容纳白色 space) - 删除了
rootNode
上最外面的
Dict
代码:
LBRACE,RBRACE,SLASH,SEMI,EQ = map(pyparsing.Suppress, "{}/;=")
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + EQ
+ pyparsing.Group(propertyValue)
+ SEMI))
childNode = pyparsing.Forward()
rootNode = pyparsing.Group(SLASH + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI)
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI))
使用 asDict 转换为字典并使用 pprint 打印得到:
pprint.pprint(result[0].asDict())
{'children': {'node1': {'children': {'node11': {'children': [],
'properties': {'property111': ['string111'],
'property112': ['string112']}}},
'properties': {'property11': ['string11'],
'property12': ['string12']}},
'node2': {'children': [],
'properties': {'property21': ['string21'],
'property22': ['string22']}}},
'properties': {'property1': ['string1'], 'property2': ['string2']}}
您还可以使用 pyparsing 的 ParseResults
class 中包含的 dump()
方法来帮助可视化列表和 dict/namespace-style 按原样访问结果, 无需任何转换调用
print(result[0].dump())
[[['property1', ['string1']], ['property2', ['string2']]], [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]]
- children: [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]
- node1: [[['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]]
- children: [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]
- node11: [[['property111', ['string111']], ['property112', ['string112']]], []]
- children: []
- properties: [['property111', ['string111']], ['property112', ['string112']]]
- property111: ['string111']
- property112: ['string112']
- properties: [['property11', ['string11']], ['property12', ['string12']]]
- property11: ['string11']
- property12: ['string12']
- node2: [[['property21', ['string21']], ['property22', ['string22']]], []]
- children: []
- properties: [['property21', ['string21']], ['property22', ['string22']]]
- property21: ['string21']
- property22: ['string22']
- properties: [['property1', ['string1']], ['property2', ['string2']]]
- property1: ['string1']
- property2: ['string2']