为什么代码使用较小的列表实例化哈希集？

Question

我正在查看 org.apache.commons.collections4.ListUtils class 并注意到代码如下：

public static <e> List<e> intersection(final List<? extends E> list1, final List<? extends E> list2) {
        final List<e> result = new ArrayList<>();

        List<? extends E> smaller = list1;
        List<? extends E> larger = list2;
        if (list1.size() > list2.size()) {
            smaller = list2;
            larger = list1;
        }

        final HashSet<e> hashSet = new HashSet<>(smaller);

        for (final E e : larger) {
            if (hashSet.contains(e)) {
                result.add(e);
                hashSet.remove(e);
            }
        }
        return result;
    }

我们知道为什么他们将较小的列表转为哈希集并循环较大的列表吗？谢谢。

Answer 1

假设较小的列表有 M 个条目，较大的列表有 N 个条目，并且集合为您提供基本操作（添加、包含）的恒定时间访问。

如果我用 Big O 符号对这个算法进行分类，运行时间将是 O(M+N) 并且额外的内存消耗 O(M)。

2 个观察结果，如果我们将较小的列表与较大的列表切换：

额外的内存使用量将增加到 O(N)，因此这是不这样做的原因之一。
理论上运行time 不会改变，仍然 O(M+N)，但实际上创建一组 N 条目，将是比迭代更重量级的操作在它上面。

如果您想验证这些假设，请尝试 JMH，这是 Java 中运行微基准测试的工具。

我运行与 M=1000、N=10000 进行了不科学的基准测试。这是我得到的：

Benchmark               (size)  Mode  Cnt       Score      Error  Units
IntersectBench.larger    10000  avgt    5  190481.075 ± 6488.649  ns/op
IntersectBench.smaller   10000  avgt    5  125997.594 ± 1616.975  ns/op

有趣的值在Score，这里越小越好。 IntersectBench.smaller 是与上面相同的算法，IntersectBench.larger 是交换列表并删除交换它们的优化的算法。如您所见，未优化的版本速度慢了 50%。

为什么代码使用较小的列表实例化哈希集？

why does the code instantiate the hashset using smaller list?

java

apache-commons