从数组创建最小堆 - 2 种方法

Question

我正在研究有关从数组构建最小堆的问题。我有两种方法——第一种是递归，第二种是使用 while 循环。递归方法通过了在线评分器的测试，但 while 循环版本似乎不起作用。我在下面的代码中生成了一些随机压力测试，发现这两种方法也给出了不同的答案。

请问我第二种方法有什么问题吗？题目如下：

Input Format. The first line of the input contains single integer . The next line contains space-separated integers .

Constraints. 1 ≤ ≤ 100 000; 0 ≤ , ≤ − 1; 0 ≤ ₀, ₁,..., ₋₁ ≤ 109. All are distinct.

Output Format. The first line of the output should contain single integer — the total number of swaps.

must satisfy conditions 0 ≤ ≤ 4. The next lines should contain the swap operations used to convert the array into a heap. Each swap is described by a pair of integers , — the 0-based indices of the elements to be swapped. After applying all the swaps in the specified order the array must become a heap, that is, for each where 0 ≤ ≤ − 1 the following conditions must be true:

If 2 + 1 ≤ − 1, then < ₂₊₁.

If 2 + 2 ≤ − 1, then < ₂₊₂.

Note that all the elements of the input array are distinct. Note that any sequence of swaps that has length at most 4 and after which your initial array becomes a correct heap will be graded as correct.

我的代码：

# python3

from random import randint

swaps = []

def sift_down(i, n, data):
    min_index = i
    left_child = 2*i + 1
    right_child = 2*i + 2
    if left_child < n and data[left_child] < data[min_index]:
        min_index = left_child
    if right_child < n and data[right_child] < data[min_index]:
        min_index = right_child
    if i != min_index:
        swaps.append([i, min_index])
        data[i], data[min_index] = data[min_index], data[i]
        sift_down(min_index, n, data)

def build_heap(data):
    n = len(data)
    for i in range(n//2, -1, -1):
        sift_down(i, n, data)

    return swaps

# wrong answer using while loop instead of recursion
def build_heap2(data):
    swap = []
    for i in range(len(data)-1, 0, -1):
        current_node = i
        prev_node = i // 2 if i % 2 != 0 else i // 2 - 1

        while data[prev_node] > data[current_node] and current_node != 0:
            swap.append((prev_node, current_node))
            data[prev_node], data[current_node] = data[current_node], data[prev_node]
            current_node = prev_node
            prev_node = current_node // 2 if current_node % 2 != 0 else current_node // 2 - 1

    return swap


def main():
    # n = int(input())
    # data = list(map(int, input().split()))
    # assert len(data) == n
    
    while True:
        n = randint(1, 100000)
        data = []
        data2 = []
        for i in range(n):
            data.append(randint(0, 10^9))
        data2 = data.copy()
        
        swaps = build_heap(data)
        swaps2 = build_heap2(data2)
        
        
        if swaps != swaps2:
            print("recursion")
            print(data[0], len(data), len(swaps))
            print("loop:")
            print(data2[0], len(data2), len(swaps2))
            break
        
        else:
            print("success")
    
    swaps = build_heap(data)

    print(len(swaps))
    for i, j in swaps:
        print(i, j)

if __name__ == "__main__":
    main()

Answer 1

您的build_heap2实现了一个不正确的想法。它从树的底部开始（正确），然后在尚未堆化的树的上部冒泡值 up 树（错误）。这个不好。它不仅会报告错误的交换次数，而且不会始终提供有效的堆。例如，对于 [3, 1, 2, 4, 0]，交换后的结果仍然不是堆，因为值 1 最终是 child of 3.

目的是在树的底部建立小堆，当parent节点的children变成堆后，parent中的值节点被筛选 down 到这些 child-heaps 中的任何一个。这是正确的，因为现在移动值正在内移动一个已经堆化的子树。结果是这两个小堆的 parent 现在是有效堆本身的根。因此在算法结束时，根将成为有效堆的根。

因此，您需要向下交换（选择值最小的 child），而不是在树中向上交换值。

这是更正后的版本：

def build_heap(data):
    swap = []
    # We can start at the deepest parent:
    for i in range(len(data) // 2 - 1, -1, -1):
        current_node = i
        
        while True:
            child_node = current_node * 2 + 1
            if child_node >= len(data):
                break
            if child_node + 1 < len(data) and data[child_node + 1] < data[child_node]:
                child_node += 1
            if data[current_node] < data[child_node]:
                break
            # swap the current value DOWN, with the least of both child values
            swap.append((child_node, current_node))
            data[child_node], data[current_node] = data[current_node], data[child_node]
            current_node = child_node
    return swap

Answer 2

有（至少）两种构建堆的方法。

O(N) 解决方案从数据集的中间向开始反向工作，确保每个连续的元素在该点都是子树的正确根：

def build_heap_down(data):
    n = len(data)
    for subtree in range(n // 2 - 1, -1, -1):
        sift_down(subtree, n, data)

另一个解决方案，即 O(N log N)，只是依次将每个元素添加到一个连续更大的堆中：

def build_heap_up(data):
    for new_element in range(1, n):
        sift_up(new_element, data)

因为在最坏的情况下 build_heap_up() 是 log-linear（我相信是 reverse-sorted 输入），它可能不满足你的任务的需要，它强加了线性受交换次数的约束。尽管如此，一些实验还是值得做的。也许这就是这项作业的意义所在。

def sift_up(elt, data):
    while elt > 0:
        parent = (elt - 1) // 2
        if data[parent] <= data[elt]: return
        swap(parent, elt, data)
        elt = parent

def sift_down(elt, limit, data):
    while True:
        kid = 2 * elt + 1
        if kid >= limit: return
        if kid + 1 < limit and data[kid + 1] < data[kid]: kid += 1
        if data[elt] <= data[kid]: return
        swap(elt, kid, data)
        elt = kid

这里的关键见解是 sift_up 和 sift_down 都要求他们正在使用的数组是一个堆，除了被筛选的元素。 sift_down 适用于从筛选元素到末尾的数组，因此在整个数组上正确执行它需要向后工作。 sift_up 从开始到筛选元素处理数组，因此迭代必须向前进行。

据我所知，您的 build_heap 确实 build_heap_down。尽管它使用递归，但它与我上面的循环（以及来自 ); recursion at the very end of a function can always be turned into a simple loop using tail call elimination 的版本）做同样的事情。（一些语言自动执行此程序转换，但 Python 不是其中之一。）

您的 build_heap2 是 build_heap_up 的错误版本，因为它向后工作而不是向前工作。这很容易修复。但是不要指望它会产生相同的堆，更不用说相同的交换列表了。可以从给定的数字列表构建许多可能的堆，这就是为什么可以找到 build_heap 而不是 sort.

的 O(N) 算法的原因

从数组创建最小堆 - 2 种方法

Creating min heap from array - 2 methods

python

arrays

algorithm

heap

data-structures