将深度很大的嵌套字典（森林）写入 BFS 样式的文本文件

Question

继续我的旧问题：

现在想把森林遍历写成BFS风格：我有一个巨大的深度字典，代表森林（许多非二叉树），我想处理森林并创建一个文本文件，其中包含来自森林的（父亲，儿子）关系序列，即给定字典：

{'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}},
 't': {'r': {'o': {}}, 'y': {}}}

生成的文本文件如下所示：

(ROOT,b) (ROOT,g) (ROOT,f) (b,c) (b,d) (c,x) (d,p) \n
(ROOT,r) (ROOT,y) (r,o) \n

请注意，我将森林中的所有根替换为单词 "ROOT"。

这是森林的简单可视化：

嵌套字典很大，递归地迭代它会产生内存运行时间错误，因此 "Generator style" 解决方案如本问题开头的 link 将成为最好的。

Answer 1

用生成器递归遍历结构最简单：

def flatten_forest(forest, write=True):
  def flatten(d, seen = None):
    for a, b in d.items():
      if seen is None:
       yield ('ROOT', a)
      else:
        yield (seen, a)
      if b:
        yield from flatten(b, a)
  if write:
    with open('full_flattened_tree.txt', 'a') as f:
      f.write(' '.join(map(str, flatten(forest)))+'\n')

data = {'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}}, 't': {'r': {'o': {}}, 'y': {}}}
for i in data.values():
  flatten_forest(i)

文件输出：

('ROOT', 'b') ('b', 'c') ('c', 'x') ('b', 'd') ('d', 'p') ('ROOT', 'g') ('ROOT', 'f')
('ROOT', 'r') ('r', 'o') ('ROOT', 'y')

这适用于大型词典：

import random, string, time
def create_structure(_len, _depth = 5, _count = 0):
 return {string.ascii_lowercase[i]:{} if _depth == _count else create_structure(random.randint(1, 26), _count = _count + 1) for i in range(_len)}

d = create_structure(26)
c = time.time()
flatten_forest(d, write=True)
print(time.time()-c)

输出：

11.871491193771362

Answer 2

要执行 breadth-first-search，我们必须保留当前工作节点和它们下面的树的列表 - 我选择将它们存储在元组中。

例如，当我们在 c 和 d 节点的深度工作时，这个树列表将是：

[('c': {'x': {}}), ('d': {'p': {}})]

现在虽然我们下面还有树（while len(trees):），我们需要下到树下面的一层。

第一步显然是重置 trees 列表，因为我们将生成下一层。

然后我们遍历我们的树列表，对于每棵树，我们遍历它的 children。

所以以上面的例子为例，在第一次迭代中，节点将是 'c' 而 children 将是：{'x': {}} 我们现在要迭代 child仁。因此，在 children 循环的第一次迭代中，第一个 child 节点将是 'x' 及其 children（c 的 child的children)为空：{}.

现在，在这个范围内（节点的 child），如果 child 有 children 我们想添加 child 和它的 children（同样，作为一个元组）到树的列表。

所以举个例子，哪里有children，当当前节点是b，那么它的children之一就是c，因为c 有 children，（c，c 的 children）的元组被附加到下一层的树列表中。

最后，不管这个child有没有children，我们都希望文件中的当前行在我们和他们之间link。这是 (node, child_node).

差不多就这样了。当然，当我们完成一棵树后，我们需要向文件中写入一个new-line。

唯一烦人的细节是写入文件的元组之间的 spaces 问题。如果我们总是将 space 连接到每个元组的末尾，我们最终会在每行的末尾出现一个杂散的 space，如下所示，这并不理想。

(ROOT, a)S(a,b)S

（其中S代表一个space）

因此，为了弥补这一点，我们将始终在每个元组之前连接一个 space ，只要我们不是换行的第一个（line_first).为此，在每棵树（行）的开头，我们将 line_first 标志设置为 True，但随后在代码中，我们立即将其设置为 False迭代（但跳过写 space），否则（未来元组）我们在之前写一个 space。

就是这样。这是完整的代码：

the_tree = {'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}}, 't': {'r': {'o': {}}, 'y': {}}} with open('the_file', 'w') as file: for tree in the_tree.values(): line_first = True trees = [('ROOT', tree)] while len(trees): new_trees = [] for node, children in trees: for child_node, child_children in children.items(): if child_children: new_trees.append((child_node, child_children)) if line_first: line_first = False else: file.write(' ') file.write(f'({node}, {child_node})') trees = new_trees file.write('\n')

警告：使用了 3.6 版本中引入的 f-strings！

它产生预期的输出：

(ROOT, b) (ROOT, g) (ROOT, f) (b, c) (b, d) (c, x) (d, p) (ROOT, r) (ROOT, y) (r, o)

Answer 3

d = {'a': {'b': {'c': {'x': {}}, 'd': {'p': {}}}, 'g': {}, 'f': {}}, 't': {'r': {'o': {}}, 'y': {}}}
with open('file', 'w') as f:
    for r, s in d.items():
        q = []
        p = r
        while True:
            for k, v in s.items():
                f.write('(%s,%s) ' % ('ROOT' if p == r else p, k))
                if v:
                    q.append((k, v))
            if not q:
                break
            p, s = q.pop(0)
        f.write('\n')

这输出：

(ROOT,b) (ROOT,g) (ROOT,f) (b,c) (b,d) (c,x) (d,p) 
(ROOT,r) (ROOT,y) (r,o)

将深度很大的嵌套字典（森林）写入 BFS 样式的文本文件

Writing nested dictionary (forest) of a huge depth to a text file in BFS style

python

dictionary

tree

nested

breadth-first-search