O(n) 重击手 O(1/epsilon) space？

Question

我知道以下重击手算法：

Algorithm findHeavyHitters(epsilon, inputStream)
    integer k = ceiling(1 / epsilon) - 1
    initialize hashmap H of size k

    while an item i from the input stream arrives:
        if H[i] exists
            increment the value associated with H[i]
        elsif number of items in H < k
            put H[i] into map with value of 1
        elseif there exists an entry j with a value of 0
            remove j and put H[i] into map with value of 1
        else
            decrement all values in H by 1
    endwhile

    return H

如果我错了请纠正我，但是这个算法在 O(n) 中不会运行。是否可以修改此算法，使其在 O(n) 中运行s，同时保持 space 的 O(1/epsilon) 使用？

对于数据流，算法的重点是 return 顶部 epsilon*t 项。 Epsilon 以百分比形式给出（例如，对于至少出现 10% 的时间的数据，输入 0.1）。

Answer 1

该算法的平均运行时间为 O(n)，基于散列查找的平均时间为 O(1)。

有两个实现细节。首先，最后一步似乎涉及触及 H 中的每个值：

将 H 中的所有值减 1

为了使这个 O(1)，我们添加一个额外的存储位置，称为 base，它被初始化为 0。然后我们修改算法如下：

while an item i from the input stream arrives:
    if H[i] exists
        increment the value associated with H[i]
    elsif number of items in H < k
        put H[i] into map with value of base + 1
    elseif there exists an entry j with a value of base 
        remove j and put H[i] into map with value of base + 1
    else
        increment base
endwhile

第二个问题是在 O(1) 中查找值为 base（或 0）的条目。这可以通过将元素保存在 "comb" 中来完成：双向链表的链表。每个内部链表都包含具有特定计数的条目。外部链表包含按计数排序的计数列表，头部指向计数最小的列表。如果你画这个数据结构，它看起来像一个梳子：

[  base    ] -> entry a -> entry b -> entry c
    |
[ base + i ] -> entry d
    |
[ base + j ] -> entry e -> entry f
    |
   etc.

散列 table 现在指向条目，而不是包含它们。为了增加单个条目的计数，将该条目从其列表中删除（如果列表包含多个元素）并插入到下一个列表或放入一个单元素列表中，该列表插入到它所在的列表之后，取决于与下一个列表关联的计数。这个操作是O(1).

梳状数据结构仍然是 O(k)，其中 k 是散列中元素的数量，因为不同的计数不能超过元素。

您可以使用一个简单的数组和每个计数的第一个条目的索引列表来代替双向链表。要将一个条目移动到下一个计数桶，首先将它与具有该计数的最后一个条目交换，然后要么前进到下一个计数列表的开头，要么插入一个新的计数列表条目，具体取决于下一个计数列表的计数是否是大于一或大于一。要完成交换，需要更新哈希中两个交换条目的位置，但这仍然是 O(1)。

O(n) 重击手 O(1/epsilon) space？

O(n) Heavy-Hitters with O(1/epsilon) space?

algorithm

bigdata

streaming-algorithm