如何遍历哈夫曼树?

How to traverse a Huffman tree?

我是编程新手,目前正在努力学习 Python。我正在按照一个教程进行操作,该教程包括通过 4 个步骤构建霍夫曼编码应用程序;我有前 3 个,但我卡在了第四个。第一步是获取字符串中唯一字符的频率,如下所示:

{'a': 8, 'b': 7, 'c': 6, 'd': 5, 'e': 4, 'f': 3, 'g': 2, 'h': 1}

第二步是将字符和频率放入元组中:

[(8, 'a'), (7, 'b'), (6, 'c'), (5, 'd'), (4, 'e'), (3, 'f'), (2, 'g'), (1, 'h')]

第三步实际上是构建树,它看起来像这样:

[(36, (15, (7, 'b'), (8, 'a')), (21, (9, (4, 'e'), (5, 'd')), (12, (6, 'c'), 
 (6, (3, 'f'), (3, (1, 'h'), (2, 'g'))))))]

第四步是 "traverse" 树,同时为每个分支分配一个代码,如果它在左边则为 1,如果它在右边则为 0。我想过几种方法,但都没有成功。

我看到树列表的每一层都有两个元素(tree[1]tree[2])代表左右分支,但我不知道最有效的方法是什么在树上重申。我想使用一个循环,如果它检测到属于它的角色,它会继续深入到每个分支,但它没有用,我不确定为什么。

def in_list(my_list, item):
    try:
        return any(item in sublist for sublist in my_list)
    except:
        return False


def get_codes(tree, tuples):

    tree = tree[0]

    for i in tuples:
        check = True
        while check is True:
            if in_list(tree[1], i) is True:
                print(i, 'is in 1')
                node1 = True
            else:
                node1 = False
            if in_list(tree[2], i) is True:
                print(i, 'is in 2')
                node2 = True
            else:
                node2 = False

            tree = tree[2]

            if node1 is False and node2 is False:
                check = False

    return 'test'

我什至不确定这是解决此问题的最佳方法。

这是我的完整代码,以备不时之需:

def get_frequency(string):
    freq = dict.fromkeys(string, 0)   # fromKeys function takes every unique element of the given parameter.
    for i in string:                  # If given a string, it take each unique character.
        freq[i] += 1

    print('Step 1: ', freq)
    return freq


def get_nodes(dictionary):
    node_list = []

    list_keys = list(dictionary.values())
    list_values = list(dictionary.keys())

    for i in range(len(list_keys)):
        node_list.append((list_keys[i], list_values[i]))

    print('Step 2: ', node_list)
    return node_list


def get_tree(node_set):

    tree_list = node_set

    for i in range(len(tree_list)-1):

        # Sort nodes in ascending order. Lowest frequency first
        tree_list.sort(key=lambda tup: tup[0])

        # Defining the next node (f,l,r) based on the two tuples with the lowest frequency
        l_freq = tree_list[0][0]
        r_freq = tree_list[1][0]
        f = l_freq + r_freq

        l_tuple = tree_list[0]
        r_tuple = tree_list[1]

        # Append new node, delete old node
        node = (f, l_tuple, r_tuple)
        tree_list.remove(l_tuple)
        tree_list.remove(r_tuple)
        tree_list.append(node)

    print('Step 3: ', tree_list)
    return tree_list


def in_list(my_list, item):
    try:
        return any(item in sublist for sublist in my_list)
    except:
        return False


def get_codes(tree, tuples):

    tree = tree[0]

    for i in tuples:
        check = True
        while check is True:
            if in_list(tree[1], i) is True:
                print(i, 'is in 1')
                node1 = True
            else:
                node1 = False
            if in_list(tree[2], i) is True:
                print(i, 'is in 2')
                node2 = True
            else:
                node2 = False

            tree = tree[2]

            if node1 is False and node2 is False:
                check = False

    return 'test'


text = 'aaaaaaaabbbbbbbccccccdddddeeeefffggh'

frequency = get_frequency(text)
nodes = get_nodes(frequency)
huff_tree = get_tree(nodes)
codes = get_codes(huff_tree, get_nodes(frequency))

通常,遍历树最方便的方法是递归。这棵树有两种节点:

  • 具有 2 个元素的元组 (_, letter) 总是叶子。
  • 具有 3 个元素的元组 (_, left, right) 始终是内部节点。

算法是这样的:如果你在一个内部节点,在左边递归当前码字扩展0,然后在右边递归当前码字扩展1。如果你是在叶节点,只需将字母与当前代码字配对即可。 "initial" 代码字是空字符串。

这是一个使用递归生成器函数的实现。

def make_codewords(tree):
    def helper(node, codeword):
        if len(node) == 2:
            _, letter = node
            yield (codeword, letter)
        else:
            _, left, right = node
            yield from helper(left, codeword + '0')
            yield from helper(right, codeword + '1')

    root = tree[0]
    # convert (codeword, letter) pairs to dictionary
    return dict(helper(root, ''))

示例:

>>> tree = [(36, (15, (7, 'b'), (8, 'a')), (21, (9, (4, 'e'), (5, 'd')), (12, (6, 'c'), (6, (3, 'f'), (3, (1, 'h'), (2, 'g'))))))]
>>> make_codewords(tree)
{'00': 'b',
 '01': 'a',
 '100': 'e',
 '101': 'd',
 '110': 'c',
 '1110': 'f',
 '11110': 'h',
 '11111': 'g'}