Java，从数组中寻找第K个最大值

Question

我接受了 Facebook 的采访，他们问了我这个问题。

Suppose you have an unordered array with N distinct values

$input = [3,6,2,8,9,4,5]

Implement a function that finds the Kth largest value.

EG: If K = 0, return 9. If K = 1, return 8.

我做的就是这个方法

private static int getMax(Integer[] input, int k)
{
    List<Integer> list = Arrays.asList(input);
    Set<Integer> set = new TreeSet<Integer>(list);

    list = new ArrayList<Integer>(set);
    int value = (list.size() - 1) - k;

    return list.get(value);
}

我刚刚测试，根据问题，该方法工作正常。然而，受访者表示，in order to make your life complex! lets assume that your array contains millions of numbers then your listing becomes too slow. What you do in this case? 作为提示，他建议使用 min heap。据我所知，堆的每个子值不应超过根值。因此，在这种情况下，如果我们假设 3 是根，那么 6 是它的子节点，并且它的值大于根的值。我可能是错的，但你的想法是什么，它的实现是基于 min heap?

Answer 1

编辑：检查此 answer 的 O(n) 解决方案。

你或许也可以利用 PriorityQueue 来解决这个问题：

public int findKthLargest(int[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums){
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements-k+1 > 0){
            p = pq.poll();
            k++;
        }

        return p;
    }

来自Java doc:

Implementation note: this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).

for循环运行n次，上述算法的复杂度为O(nlogn).

Answer 2

他其实已经给了你完整的答案。不只是提示。

而您的理解是基于max heap。不是 min heap。它的工作原理是不言自明的。

在 min 堆 中，根具有 最小值（小于其子级）值。

因此，您需要的是遍历数组并在 最小堆 中填充 K 个元素。一旦完成，堆自动包含根部的最低点。

现在，对于您从数组中读取的每个 (next) 元素， -> 检查该值是否大于最小堆的根。 -> 如果是，从最小堆中移除 root，并将值添加到它。

遍历整个数组后，最小堆的根将自动包含第k大元素。

并且堆中的所有其他元素（准确地说是k-1个元素）将大于k。

Answer 3

这里是 Min Heap 在 java 中使用 PriorityQueue 的实现。 复杂度： n * log k.

import java.util.PriorityQueue;

public class LargestK {

  private static Integer largestK(Integer array[], int k) {
    PriorityQueue<Integer> queue = new PriorityQueue<Integer>(k+1);
    int i = 0;
    while (i<=k) {
      queue.add(array[i]);
      i++;
    }
    for (; i<array.length; i++) {
      Integer value = queue.peek();
      if (array[i] > value) {
        queue.poll();
        queue.add(array[i]);
      }
    }
    return queue.peek();
  }

  public static void main(String[] args) {
    Integer array[] = new Integer[] {3,6,2,8,9,4,5};
    System.out.println(largestK(array, 3));
  }
}

输出：5

代码在 O(n) 数组上循环。 PriorityQueue（最小堆）的大小是 k，所以任何操作都是 log k。在最坏的情况下，所有数字都按 ASC 排序，复杂度为 n*log k，因为对于每个元素，您需要移除堆顶并插入新元素。

Answer 4

如果 array/stream 中的元素数量未知，基于堆的解决方案是完美的。但是，如果它们是有限的，但您仍然希望在线性时间内得到优化的解决方案怎么办。

我们可以使用 Quick Select，已讨论 here。

数组 = [3,6,2,8,9,4,5]

让我们选择枢轴作为第一个元素：

pivot = 3（第 0 个索引），

现在对数组进行分区，所有小于或等于的元素都在左侧，大于 3 的元素在右侧。就像在快速排序中所做的那样（在我的 blog 上讨论过）。

所以在第一次通过后 - [2,3,6,8,9,4,5]

pivot index is 1 (i.e it's the second lowest element). Now apply the same process again.

选择，现在 6，前一个主元后索引处的值 - [2,3,4,5,6,8,9]

所以现在 6 在正确的位置。

继续检查您是否找到了合适的数字（每次迭代中第 k 个最大或第 k 个最小）。如果找到你就完成了，否则继续。

Answer 5

k 常量值的一种方法是使用部分插入排序。

（这假定了不同的值，但也可以很容易地更改以使用重复值）

last_min = -inf
output = []
for i in (0..k)
    min = +inf
    for value in input_array
        if value < min and value > last_min
            min = value
    output[i] = min
print output[k-1]

（这是伪代码，但在 Java 中应该很容易实现）。

整体复杂度为 O(n*k)，这意味着当且仅当 k 是常量或已知小于 log(n)。

从好的方面来说，这是一个非常简单的解决方案。不利的一面是，它的效率不如堆解决方案

Java，从数组中寻找第K个最大值

Java, Finding Kth largest value from the array

java

arrays

algorithm

min-heap