为什么 Python heapq _siftup(...) 最后调用 _siftdown(...)?

Why does the Python heapq _siftup(...) call _siftdown(...) at the end?

_siftupgithub - python/cpython/Lib/heapq.py 的代码最终调用了 _siftdown

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

似乎 _siftup(...) 中的逻辑足以将 newitem 放置在保持堆不变性的正确位置?为什么需要调用 _siftdown()

这是作者在算法中做出的特定选择的结果。

更常见的算法是不需要最后的 _siftdown(),但循环必须在 newitem < heap[childpos] 时停止,之后 pos 将是 [= 的有效位置13=] 不再需要筛选。

然而,在这个版本中,循环会继续,直到找到一片叶子,并且 newitem 被放置在一个叶点上。这可能不是 newitem 的有效位置,因此需要额外调用才能返回到有效位置。

在这个函数之前的评论块中,作者解释了为什么他们做出这个选择,乍一看似乎效率较低,但实际上结果是比较少:

We could break out of the loop as soon as we find a pos where newitem <= both its children, but turns out that's not a good idea, and despite that many books write the algorithm that way. During a heap pop, the last array element is sifted in, and that tends to be large, so that comparing it against values starting from the root usually doesn't pay (= usually doesn't get us out of the loop early). See Knuth, Volume 3, where this is explained and quantified in an exercise.

另见 Wikipedia - bottom-up heapsort:

The change improves the linear-time heap-building phase somewhat, but is more significant in the second phase. Like ordinary heapsort, each iteration of the second phase extracts the top of the heap, a[0], and fills the gap it leaves with a[end], then sifts this latter element down the heap. But this element comes from the lowest level of the heap, meaning it is one of the [greatest]* elements in the heap, so the sift-down will likely take many steps to move it back down. In ordinary heapsort, each step of the sift-down requires two comparisons, to find the [maximum]* of three elements: the new node and its two children.


* The article has "smallest" and "minimum" since it discusses a max-heap, not a min-heap as is what heapq provides.

维基百科在堆排序的上下文中讨论这个很可惜,因为它适用于堆交互,即使堆不服务于堆排序进程。