将 xml 转换为带有标签值的字典

Question

我有一个 XML 每个标签的属性如下：

<?xml version= "1.0" encoding="ISO-8859-1" ?>
<month xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="my.xsd">
    <day Day="2016-1-01">
        <hour Hour="00:00">
            <Variables>
                <a>211.3</a>
                <b>78.94</b>
                <c>0.6</c>
            </Variables>
        </hour>
        <hour Hour="12:00">
            <Variables>
                <a>155.5</a>
                <b>85.5</b>
                <c>0.42</c>
            </Variables>
        </hour>
    </day>
</month>

我希望解析 XML 并转换为字典，但不使用标签和属性值。

我的意思是，如何能够制作类似于：

>>> print d['2016-1-01']['12:00']['b']
>>> 85.5

真正的XML有更多的天数和时间。这可能吗？

我能做的唯一解析它的方法是这个，但是如果你想在不同的时间寻找几个不同的变量就很难了：

# Day
for child_day in root:
    print child_day.tag, child_day.attrib

    # Hour
    for child_hour in child_day:
        print '\t', child_hour.tag, child_hour.attrib

        # Variables
        for child_Variables in child_hour:
            print '\t\t', child_Variables.find('b').text

是否有任何类似于 this answer 的函数使属性大小写与此相同而不是标签？

Answer 1

您链接的答案是使用所谓的dict comprehension。这是一个非常简单而优雅的解决方案，因为它将在 ElementTree 的每个级别执行相同的操作以生成 dict 的该级别，因此该函数可以递归地调用自身。

但如果我理解正确的话，你将根据你在 ElementTree 结构中所处的级别获取每个标签的不同属性以用作 dict 键，然后您将在底层将其切换为使用标签名称作为键，将文本作为值。所以我无法想出一个与您链接的答案中的解决方案一样优雅的解决方案。

我们也可以使用听写理解，但我们将不得不使用它几次（至少对于我想出的解决方案）。

听起来你希望得到一个看起来像这样的 dict（给定你的样本 XML）：

{
    "2016-1-01": {
        "12:00": {
            "a": "155.5",
            "b": "85.5",
            "c": "0.42",
        },
        "00:00": {
            "a": "211.3",
            "b": "78.94",
            "c": "0.6",
        },
    },
}

为此，您需要 3 个函数； 1 处理 dict 每个级别的创建（天、小时和变量）。它们的外观如下：

def month_etree_to_dict(month):
    d_list = month.getchildren()
    d_dict = {d.attrib["Day"]: day_etree_to_dict(d) for d in d_list}
    return d_dict

def day_etree_to_dict(day):
    h_list = day.getchildren()
    h_dict = {h.attrib["Hour"]: hour_etree_to_dict(h) for h in h_list}
    return h_dict

def hour_etree_to_dict(hour):
    v_list = hour.getchildren()[0].getchildren()
    v_dict = {v.tag: v.text for v in v_list}
    return v_dict

函数 month_etree_to_dict 生成一个 dict，其中键是每一天的日期。这些值是使用 day_etree_to_dict 函数生成的字典。 day_etree_to_dict 函数通过调用 hour_etree_to_dict 函数每小时执行相同的操作。 hour_etree_to_dict 函数的工作方式略有不同，它在 ElementTree 中向下跳了一个额外的级别，因此它可以遍历 <Variables> Element 的子级（<a>， <b> 和 <c>) 使用它们的标签名称作为 dict 的键，并使用它们的文本作为值。

我希望这对你有用。

Answer 2

我在将XML转为dict时经常使用递归defaultdict，像这样：

import xml.etree.ElementTree as ET
from collections import defaultdict


def Tree():
    return defaultdict(Tree)

tree = ET.parse('x.xml')
root = tree.getroot()
d = Tree()
for day in root.findall('day'):
    for hour in day.findall('hour'):
        for v in hour.findall('./Variables/*'):
            d[day.attrib['Day']][hour.attrib['Hour']][v.tag] = v.text

print d['2016-1-01']['12:00']['b']

参考：

将 xml 转换为带有标签值的字典

Converting xml to dictionary with values on tags

python

xml

dictionary

elementtree