heapq库中函数的时间复杂度是多少

Question

我的问题是下面leetcode的解法，我不明白为什么是O(k+(n-k)log(k))。

补充：也许复杂度不是那个，其实我不知道heappush()和heappop()

的时间复杂度

# O(k+(n-k)lgk) time, min-heap
def findKthLargest(self, nums, k):
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
    for _ in xrange(len(nums)-k):
        heapq.heappop(heap)
    return heapq.heappop(heap)

Answer 1

heapq 是一个二叉堆，O(log n) push 和 O(log n) pop。见 heapq source code.

你展示的算法需要 O(n log n) 将所有项目推入堆，然后 O((n-k) log n) 找到第 k 个最大元素。所以复杂度为 O(n log n)。它还需要 O(n) 额外的 space.

您可以在 O(n log k) 中执行此操作，通过稍微修改算法使用额外的 O(k) space。我不是Python程序员，所以你必须翻译伪代码：

# create a new min-heap
# push the first k nums onto the heap
for the rest of the nums:
    if num > heap.peek()
        heap.pop()
        heap.push(num)

# at this point, the k largest items are on the heap.
# The kth largest is the root:

return heap.pop()

这里的关键是堆只包含迄今为止看到的最大项目。如果一个项目小于目前看到的第 k 个最大的项目，它永远不会被放入堆中。最坏的情况是 O(n log k).

实际上，heapq 有一个 heapreplace 方法，因此您可以替换为：

    if num > heap.peek()
        heap.pop()
        heap.push(num)

和

    if num > heap.peek()
        heap.replace(num)

此外，推送前 k 项的替代方法是创建前 k 项的列表并调用 heapify。一个更优化的（但仍然是 O(n log k)）算法是：

# create array of first `k` items
heap = heapify(array)
for remaining nums
    if (num > heap.peek())
        heap.replace(num)
return heap.pop()

您也可以对整个数组调用 heapify，然后弹出第一个 n-k 项，然后取出顶部：

heapify(nums)
for i = 0 to n-k
    heapq.heappop(nums)
return heapq.heappop(nums)

那就更简单了。不确定它是否比我之前的建议更快，但它修改了原始数组。构建堆的复杂度为 O(n)，然后为 pops 的复杂度为 O((n-k) log n)。所以它是 O((n-k) log n)。最坏情况 O(n log n).

Answer 2

heapify() 实际上需要线性时间，因为该方法不同于调用 heapq.push() N 次。

heapq.push()/heapq.pop() 需要 log n 时间，因为它在给定的 hight/level.

调整所有节点

当你在 heapify() 中传递数组时，它确保节点的左右子节点已经在维护堆属性无论是最小堆还是最大堆。

你可以看到这个视频： https://www.youtube.com/watch?v=HqPJF2L5h9U

https://www.youtube.com/watch?v=B7hVxCmfPtM

希望这会有所帮助。

Answer 3

来自@Shivam purbia 的post总结：

使用 heaps.heapify() 可以减少 time 和 space 的复杂性，因为 heaps.heapify() 是 an in-place heapify and costs linear time to run it.
heapq.heappush() 和 heapq.heappop() 都花费 O(logN) 时间复杂度

最终代码会是这样的...

import heapq

def findKthLargest(self, nums, k):
    heaps.heapify(nums)            # in-place heapify -> cost O(N) time
    
    for _ in range(len(nums)-k):   # run (N-k) times
        heapq.heappop(heap)        # cost O(logN) time
    return heapq.heappop(heap)

总时间复杂度为 O((N - k)logN)
总 space 复杂度为 O(1)

Answer 4

对于仅仅创建和堆化元素来说，它是 O(nlogn)。但是对于仅仅堆化元素，它是 o(n).

Answer 5

题目中popout the smallest from heap不是最佳答案

假设您的输入有 100 万个项目，那么您需要弹出 1m - k 次

相反，在 python 中，我们可以使用 maxheap，当 n 超大

def findKthLargest(self, nums: List[int], k: int) -> int:
        _heapify_max(nums)
        while k > 0:
            val = _heappop_max(nums)
            k-=1
            if k == 0 :
                return val

heapq库中函数的时间复杂度是多少

What's the time complexity of functions in heapq library

python

heap