将嵌套的字典结构展平为数据集

Question

对于某些 post 处理，我需要展平这样的结构

{'foo': {
          'cat': {'name': 'Hodor',  'age': 7},
          'dog': {'name': 'Mordor', 'age': 5}},
 'bar': { 'rat': {'name': 'Izidor', 'age': 3}}
}

进入此数据集：

[{'foobar': 'foo', 'animal': 'dog', 'name': 'Mordor', 'age': 5},
 {'foobar': 'foo', 'animal': 'cat', 'name': 'Hodor',  'age': 7},
 {'foobar': 'bar', 'animal': 'rat', 'name': 'Izidor', 'age': 3}]

所以我写了这个函数：

def flatten(data, primary_keys):
    out = []
    keys = copy.copy(primary_keys)
    keys.reverse()
    def visit(node, primary_values, prim):
        if len(prim):
            p = prim.pop()
            for key, child in node.iteritems():
                primary_values[p] = key
                visit(child, primary_values, copy.copy(prim))
        else:
            new = copy.copy(node)
            new.update(primary_values)
            out.append(new)
    visit(data, { }, keys)
    return out

out = flatten(a, ['foo', 'bar'])

我不是很满意，因为我必须使用 copy.copy 来保护我的输入。显然，当使用 flatten 时，不希望更改输入。

然后我想到了一个替代方案，使用更多的全局变量（至少全局到flatten）并使用索引而不是直接将primary_keys传递给visit。然而，这并不能真正帮助我摆脱丑陋的初始副本：

    keys = copy.copy(primary_keys)
    keys.reverse()

所以这是我的最终版本：

def flatten(data, keys):
    data = copy.copy(data)
    keys = copy.copy(keys)
    keys.reverse()
    out = []
    values = {}
    def visit(node, id):
        if id:
            id -= 1
            for key, child in node.iteritems():
               values[keys[id]] = key
               visit(child, id)
        else:
            node.update(values)
            out.append(node)
    visit(data, len(keys))
    return out

有没有更好的实现方式（可以避免使用copy.copy）？

Answer 1

编辑：修改以说明可变字典深度。

通过使用我之前的回答（如下）中的 merge 函数，您可以避免调用 update 来修改调用者。那么就不需要先复制字典了。

def flatten(data, keys):
    out = []
    values = {}
    def visit(node, id):
        if id:
            id -= 1
            for key, child in node.items():
               values[keys[id]] = key
               visit(child, id)
        else:
            out.append(merge(node, values))  # use merge instead of update
    visit(data, len(keys))
    return out

有一件事我不明白，为什么你需要保护 keys 输入。我没有看到它们在任何地方被修改。

上一个回答

列表理解怎么样？

def merge(d1, d2):
    return dict(list(d1.items()) + list(d2.items()))

[[merge({'foobar': key, 'animal': sub_key}, sub_sub_dict) 
    for sub_key, sub_sub_dict in sub_dict.items()] 
        for key, sub_dict in a.items()]

棘手的部分是在不使用 update（returns None）的情况下合并字典。

将嵌套的字典结构展平为数据集

Flatten a nested dict structure into a dataset

python

dataset

data-structures