如何改进有关堆排序的 Python 代码？

Question

我尝试自己为 Leetcode 217. Contains Duplicate 编写堆排序程序，如下所示，而不是使用 Python 内置排序方法。 Leetcode 应该接受堆排序方法，但由于某些我不知道的原因，虽然我的堆排序程序运行良好，但我仍然得到了 Leetcode 的运行时拒绝。谁能帮忙？

已解决，下面的代码使用Floyd算法重新编写，初始化堆并通过了Leetcode

def heapsort(nums):

    def swap(i, j):

        nums[i], nums[j] = nums[j], nums[i]


    def sift(start, size):

        l = (start << 1) + 1 # Do not forget () for <<
        r = l + 1
        largest = start
        if l <= size - 1 and nums[start] < nums[l]:
            largest = l
        if r <= size - 1 and nums[largest] < nums[r]:
            largest = r   
        if largest != start:
            swap(start, largest)
            sift(largest, size)


    size = len(nums)

    # Initialize heap (Floyd Algorithm)
    end = (size >> 1) - 1
    while end >= 0:
        sift(end, size)
        end -= 1
    swap(0, size - 1)
    size -= 1

    # Heapify recursively
    while size > 1:
        sift(0, size)
        swap(0, size - 1)
        size -= 1

Answer 1

说到堆排序，考虑 heapq python module. It exists for this very purpose - provide an implementation of the heap queue algorithm. It isn't designed not very conveniently, but there are handy wrappers - 你可以 google 自己做。

说到查找重复项，任何 n log(n) 排序算法都不够有效。看看 python set 内置函数！

Answer 2

你的代码做的太多了。您正在用您删除的每个项目重建整个堆。那么 O(n log n) 算法应该是 O(n^2).

基本上，您的代码是这样做的：

while array is not empty
    rearrange array into a heap
    extract the smallest item

重新排列堆最多需要 O(n) 时间。提取最小值需要 O(log n)。所以你的算法是 O(n^2 + n log n)。

实际上，您自下而上构建堆的方法本身就是 O(n log n)。所以你的堆排序算法实际上是 O((n+1)*(n log n))。无论如何，这是一个高度次优的算法。

堆排序背后的思想是将数组排列到堆中一次。这是一个 O(n) 操作。算法非常简单：

for i = heap.length/2 downto 1
    siftDown(i)

这被称为 Floyd's algorithm，以其发明者的名字命名。

请注意，我们从数组的中间开始并向下筛选。这个想法是最后的 n/2 项是叶节点，所以无论如何它们都不能筛选。通过从 n/2 开始并向后工作，我们可以在 O(n) 时间内堆化整个数组。

数组排列成堆后，我们进行如下操作：

while heap is not empty
    output the item at heap[0]
    move the item at the end of the heap to heap[0]
    reduce the count of items by 1
    siftDown(0)

heap[0]处的item是堆中剩余的最小item，所以输出。然后，就没有必要重建整个堆。您所要做的就是取出堆中的最后一项，将其放在顶部，然后将其向下筛选到位。堆的其余部分仍然有效。

进行这些更改应该会减少运行时间，但我不知道这是否会让您的代码接受 table。还有另一种检查重复项的方法。它需要额外的 O(n) space，但它比排序更快。

想法是创建一个散列 table，然后遍历数组，检查项目是否在散列 table 中。如果没有，请添加它。如果它已经在 table 中，则它是重复的。正如 Harold 所指出的，Python 有一个 set 类型可以让这种事情变得容易。

如何改进有关堆排序的 Python 代码？

How do I improve my Python code about heap sort?

python

sorting

algorithm

heapsort