为什么 heapq.heapify 这么快?
Why is heapq.heapify so fast?
我尝试重新实现 heapify 方法,以便使用 _siftup
和 _siftdown
更新或删除堆中的任何节点并保持 O(log(n)) 的时间复杂度。
我做了一些努力来优化我的代码,但事实证明它们比 heapq.heapify
(就花费的总时间而言) 更差。所以我决定调查 source code。并将复制的代码与模块的代码进行比较。
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(range(n//2)):
_siftup(x, i)
a = list(reversed(range(1000000)))
b = a.copy()
import heapq
import time
cp1 = time.time()
heapq.heapify(a)
cp2 = time.time()
heapify(b)
cp3 = time.time()
print(a == b)
print(cp3 - cp2, cp2 - cp1)
而且我总是发现 cp3 - cp2
>= cp2 - cp1
而不是相同的,heapify
比 heapq.heaify
花费更多的时间,即使两者相同。
在某些情况下 heapify
花费了 3 秒,而 heapq.heapify
花费了 0.1 秒
heapq.heapfy 模块执行速度比相同的 heapify 更快,它们仅通过导入不同。
请告诉我原因,如果我犯了一些愚蠢的错误,我很抱歉。
来自heapq
模块的heapify
实际上是一个内置函数:
>>> import heapq
>>> heapq
<module 'heapq' from 'python3.9/heapq.py'>
>>> heapq.heapify
<built-in function heapify>
help(heapq.heapify)
说:
Help on built-in function heapify
in module _heapq
...
所以它实际上是在导入内置模块 _heapq
,因此是 运行 C 代码,而不是 Python.
如果您进一步滚动 heapq.py
代码,您将看到 this:
# If available, use C implementation
try:
from _heapq import *
except ImportError:
pass
这将用 C 实现覆盖像 heapify
这样的函数。例如,_heapq.heapify
是 here.
我尝试重新实现 heapify 方法,以便使用 _siftup
和 _siftdown
更新或删除堆中的任何节点并保持 O(log(n)) 的时间复杂度。
我做了一些努力来优化我的代码,但事实证明它们比 heapq.heapify
(就花费的总时间而言) 更差。所以我决定调查 source code。并将复制的代码与模块的代码进行比较。
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(range(n//2)):
_siftup(x, i)
a = list(reversed(range(1000000)))
b = a.copy()
import heapq
import time
cp1 = time.time()
heapq.heapify(a)
cp2 = time.time()
heapify(b)
cp3 = time.time()
print(a == b)
print(cp3 - cp2, cp2 - cp1)
而且我总是发现 cp3 - cp2
>= cp2 - cp1
而不是相同的,heapify
比 heapq.heaify
花费更多的时间,即使两者相同。
在某些情况下 heapify
花费了 3 秒,而 heapq.heapify
花费了 0.1 秒
heapq.heapfy 模块执行速度比相同的 heapify 更快,它们仅通过导入不同。
请告诉我原因,如果我犯了一些愚蠢的错误,我很抱歉。
来自heapq
模块的heapify
实际上是一个内置函数:
>>> import heapq
>>> heapq
<module 'heapq' from 'python3.9/heapq.py'>
>>> heapq.heapify
<built-in function heapify>
help(heapq.heapify)
说:
Help on built-in function
heapify
in module_heapq
...
所以它实际上是在导入内置模块 _heapq
,因此是 运行 C 代码,而不是 Python.
如果您进一步滚动 heapq.py
代码,您将看到 this:
# If available, use C implementation
try:
from _heapq import *
except ImportError:
pass
这将用 C 实现覆盖像 heapify
这样的函数。例如,_heapq.heapify
是 here.