Guava Sets.intersection 表现不佳

Question

我今天在生产中遇到了一个 st运行ge 问题。虽然我喜欢 Guava，但我运行遇到了一个用例，其中 Guava 的 Sets.intersection() 表现非常糟糕。我写了一个示例代码：

Set<Long> cache = new HashSet<>();
for (long i = 0; i < 1000000; i++) {
    cache.add(i);
}
Set<Long> keys = new HashSet<>();
for (long i = 0; i < 100; i++) {
    keys.add(i);
}
long start = System.currentTimeMillis();
Set<Long> foundKeys = new HashSet<>();
for (Long key : keys) {
    if (cache.contains(key)) {
        foundKeys.add(key);
    }
}
System.out.println("Java search: " + (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
SetView<Long> intersection = Sets.intersection(keys, cache);
System.out.println("Guava search: " + (System.currentTimeMillis() - start));

我尝试创建一个类似的生产场景，其中我有一个密钥缓存，我正在寻找缓存中存在的所有密钥。 St运行gely，Guava 搜索比 Java 搜索花费的时间长得多。在运行之后我得到了：

Java search: 0
Guava search: 36

谁能告诉我为什么这不适合我的用例，或者 Guava 中是否存在错误？

Answer 1

事实证明问题出在多次调用 SetView.size()。由于 SetView 是两组交集的（实时）视图，因此每次都需要重新计算交集大小。

public static <E> SetView<E> intersection( final Set<E> set1, final Set<?> set2) {
//...
  return new SetView<E>() {
    @Override public Iterator<E> iterator() {
      return Iterators.filter(set1.iterator(), inSet2);
    }
    @Override public int size() {
      return Iterators.size(iterator());
    }
    //...
  };
}

从这里可以看出，在这种情况下，重新计算意味着遍历整个视图，这可能非常耗时。

解决这个问题的方法是确保 size() 只被调用一次并且值被存储（如果你知道底层集不会改变），或者如果那不是可能的话，通过 ImmutableSet.copyOf() 创建交叉路口的副本（例如）。

Guava Sets.intersection 表现不佳

Guava Sets.intersection bad performance

java

collections

performance

set

guava