高效计算 n 组的交集

Question

我有 n 个由 setId 标识的集合，每个集合可以包含任意数量的元素，它们是一对 (elementId, priority)。

我的算法应该接受输入两个 setId，并在输出中给出一个包含前 m 个元素的集合，这些元素位于两个输入集合的交集并具有最高优先级（总和优先级）。

示例：

n=3, m=1

Set1: { (1, 1), (12, 2) }
Set2: { (1, 4), (23, 6), (33, 22) }
Set3: { (33, 1), (1, 16 }


Input: Set2, Set3
Output: { (33, 23) }

我的问题是：假设我有无限个 space 我可以使用什么数据结构来优化性能？

当然，预先计算所有可能的交集并不是一个有效的答案。

编辑:

实际数字：

n，套数，为~ 10^6
集合的平均基数是 ~ 5*10^3。

Answer 1

取其中一组并将其转换为hash map. Iterate the other set, and for each member try to look up the element in the hash map. If you find it, add the result to a heap；如果堆的大小增长到比您希望保留的元素数量大一，则丢弃堆中最低的项目。

高效计算 n 组的交集

Computing efficiently intersection of n sets

algorithm

time-complexity

set-intersection

data-structures