澄清 Sedgewick "Algorithms" heapsort chapter remark(第 4 版,第 2.4 章)

Clarification on Sedgewick "Algorithms" heapsort chapter remark (4 ed, chapter 2.4)

目前正在阅读Algorithms book. Q&A section for chapter 2.4 on heapsort基于优先级队列的实现(p.328)有如下一段话(我们关注优先级队列堆,而不是堆排序):

Q. I’m still not clear on the purpose of priority queues. Why exactly don’t we just sort and then consider the items in increasing order in the sorted array?

A. In some data-processing examples such as TopM and Multiway, the total amount of data is far too large to consider sorting (or even storing in memory). If you are looking for the top ten entries among a billion items, do you really want to sort a billion-entry array? With a priority queue, you can do it with a ten-entry priority queue. In other examples, all the data does not even exist together at any point in time: we take something from the priority queue, process it, and as a result of processing it perhaps add some more things to the priority queue.

TopM, Multiway是优先级队列的简单客户端。书中谈到堆排序的两个阶段:

  1. 堆构造(作者使用优先级队列堆,大家感兴趣)
  2. 排序

在我的理解中,堆构造是几乎排序("heap order")。为了构建堆,您实际上需要访问原始数据集中的 each 项。

问题: 谁能说明我在上面引用的粗体中作者的观点?我们如何在不访问所有项目的情况下构建堆?我在这里想念什么?为澄清干杯。

当然,您必须访问所有条目。仅仅访问它们就需要 O(n) 时间。但是对它们进行排序通常需要 O(n log n) 时间。正如作者所说,您不必对所有这些进行排序。只有十大元素。基本程序如下所示:

allocate priority queue q with space for t entries
visit each entry e in the input array
    queueIsFull := size(q) == t
    if !queueIsFull || e > min(q)
        if !queueIsFull                
            insert e into q
        else
            exchange min(q) with e and bubble up
 next

这里的基本要点是,一旦知道元素不在 top-t 条目中,就从队列中删除元素。因此,插入和交换不需要 O(log n) 时间,而只需要 O(log t)。这将总时间从 O(n log n) 减少到 O(n log t),其中 log t 通常比 log n 小得多。