为什么 Python heapq _siftup(...) 最后调用 _siftdown(...)?
Why does the Python heapq _siftup(...) call _siftdown(...) at the end?
_siftup
在 github - python/cpython/Lib/heapq.py 的代码最终调用了 _siftdown
:
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
似乎 _siftup(...)
中的逻辑足以将 newitem
放置在保持堆不变性的正确位置?为什么需要调用 _siftdown()
?
这是作者在算法中做出的特定选择的结果。
更常见的算法是不需要最后的 _siftdown()
,但循环必须在 newitem < heap[childpos]
时停止,之后 pos
将是 [= 的有效位置13=] 不再需要筛选。
然而,在这个版本中,循环会继续,直到找到一片叶子,并且 newitem
被放置在一个叶点上。这可能不是 newitem
的有效位置,因此需要额外调用才能返回到有效位置。
在这个函数之前的评论块中,作者解释了为什么他们做出这个选择,乍一看似乎效率较低,但实际上结果是比较少:
We could break out of the loop as soon as we find a pos
where newitem
<=
both its children, but turns out that's not a good idea, and despite that
many books write the algorithm that way. During a heap pop, the last array
element is sifted in, and that tends to be large, so that comparing it
against values starting from the root usually doesn't pay (= usually doesn't
get us out of the loop early). See Knuth, Volume 3, where this is
explained and quantified in an exercise.
另见 Wikipedia - bottom-up heapsort:
The change improves the linear-time heap-building phase somewhat, but is more significant in the second phase. Like ordinary heapsort, each iteration of the second phase extracts the top of the heap, a[0]
, and fills the gap it leaves with a[end]
, then sifts this latter element down the heap. But this element comes from the lowest level of the heap, meaning it is one of the [greatest]* elements in the heap, so the sift-down will likely take many steps to move it back down. In ordinary heapsort, each step of the sift-down requires two comparisons, to find the [maximum]* of three elements: the new node and its two children.
* The article has "smallest" and "minimum" since it discusses a max-heap, not a min-heap as is what heapq
provides.
维基百科在堆排序的上下文中讨论这个很可惜,因为它适用于堆交互,即使堆不服务于堆排序进程。
_siftup
在 github - python/cpython/Lib/heapq.py 的代码最终调用了 _siftdown
:
def _siftup(heap, pos): endpos = len(heap) startpos = pos newitem = heap[pos] # Bubble up the smaller child until hitting a leaf. childpos = 2*pos + 1 # leftmost child position while childpos < endpos: # Set childpos to index of smaller child. rightpos = childpos + 1 if rightpos < endpos and not heap[childpos] < heap[rightpos]: childpos = rightpos # Move the smaller child up. heap[pos] = heap[childpos] pos = childpos childpos = 2*pos + 1 # The leaf at pos is empty now. Put newitem there, and bubble it up # to its final resting place (by sifting its parents down). heap[pos] = newitem _siftdown(heap, startpos, pos)
似乎 _siftup(...)
中的逻辑足以将 newitem
放置在保持堆不变性的正确位置?为什么需要调用 _siftdown()
?
这是作者在算法中做出的特定选择的结果。
更常见的算法是不需要最后的 _siftdown()
,但循环必须在 newitem < heap[childpos]
时停止,之后 pos
将是 [= 的有效位置13=] 不再需要筛选。
然而,在这个版本中,循环会继续,直到找到一片叶子,并且 newitem
被放置在一个叶点上。这可能不是 newitem
的有效位置,因此需要额外调用才能返回到有效位置。
在这个函数之前的评论块中,作者解释了为什么他们做出这个选择,乍一看似乎效率较低,但实际上结果是比较少:
We could break out of the loop as soon as we find a
pos
wherenewitem
<= both its children, but turns out that's not a good idea, and despite that many books write the algorithm that way. During a heap pop, the last array element is sifted in, and that tends to be large, so that comparing it against values starting from the root usually doesn't pay (= usually doesn't get us out of the loop early). See Knuth, Volume 3, where this is explained and quantified in an exercise.
另见 Wikipedia - bottom-up heapsort:
The change improves the linear-time heap-building phase somewhat, but is more significant in the second phase. Like ordinary heapsort, each iteration of the second phase extracts the top of the heap,
a[0]
, and fills the gap it leaves witha[end]
, then sifts this latter element down the heap. But this element comes from the lowest level of the heap, meaning it is one of the [greatest]* elements in the heap, so the sift-down will likely take many steps to move it back down. In ordinary heapsort, each step of the sift-down requires two comparisons, to find the [maximum]* of three elements: the new node and its two children.
* The article has "smallest" and "minimum" since it discusses a max-heap, not a min-heap as is whatheapq
provides.
维基百科在堆排序的上下文中讨论这个很可惜,因为它适用于堆交互,即使堆不服务于堆排序进程。