为什么 Arrays.copyOf 比 System.arraycopy 小数组快 2 倍？

Question

我最近在玩一些基准测试，发现非常有趣的结果，我现在无法解释。这是基准：

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, batchSize = 1000)
@Measurement(iterations = 10, time = 1, batchSize = 1000)
public class ArrayCopy {

    @Param({"1","5","10","100", "1000"})
    private int size;
    private int[] ar;

    @Setup
    public void setup() {
        ar = new int[size];
        for (int i = 0; i < size; i++) {
            ar[i] = i;
        }
    }

    @Benchmark
    public int[] SystemArrayCopy() {
        final int length = size;
        int[] result = new int[length];
        System.arraycopy(ar, 0, result, 0, length);
        return result;
    }

    @Benchmark
    public int[] javaArrayCopy() {
        final int length = size;
        int[] result = new int[length];
        for (int i = 0; i < length; i++) {
            result[i] = ar[i];
        }
        return result;
    }

    @Benchmark
    public int[] arraysCopyOf() {
        final int length = size;
        return Arrays.copyOf(ar, length);
    }

}

结果：

Benchmark                  (size)   Mode  Cnt       Score      Error  Units
ArrayCopy.SystemArrayCopy       1  thrpt   10   52533.503 ± 2938.553  ops/s
ArrayCopy.SystemArrayCopy       5  thrpt   10   52518.875 ± 4973.229  ops/s
ArrayCopy.SystemArrayCopy      10  thrpt   10   53527.400 ± 4291.669  ops/s
ArrayCopy.SystemArrayCopy     100  thrpt   10   18948.334 ±  929.156  ops/s
ArrayCopy.SystemArrayCopy    1000  thrpt   10    2782.739 ±  184.484  ops/s
ArrayCopy.arraysCopyOf          1  thrpt   10  111665.763 ± 8928.007  ops/s
ArrayCopy.arraysCopyOf          5  thrpt   10   97358.978 ± 5457.597  ops/s
ArrayCopy.arraysCopyOf         10  thrpt   10   93523.975 ± 9282.989  ops/s
ArrayCopy.arraysCopyOf        100  thrpt   10   19716.960 ±  728.051  ops/s
ArrayCopy.arraysCopyOf       1000  thrpt   10    1897.061 ±  242.788  ops/s
ArrayCopy.javaArrayCopy         1  thrpt   10   58053.872 ± 4955.749  ops/s
ArrayCopy.javaArrayCopy         5  thrpt   10   49708.647 ± 3579.826  ops/s
ArrayCopy.javaArrayCopy        10  thrpt   10   48111.857 ± 4603.024  ops/s
ArrayCopy.javaArrayCopy       100  thrpt   10   18768.866 ±  445.238  ops/s
ArrayCopy.javaArrayCopy      1000  thrpt   10    2462.207 ±  126.549  ops/s

所以这里有两件奇怪的事情：

Arrays.copyOf 比 System.arraycopy 快 2 倍数组（1、5、10 大小）。然而，在一个大小为 1000 的大数组上 Arrays.copyOf 几乎慢了 2 倍。我知道两者方法是内在的，所以我希望有相同的性能。在哪里这种差异从何而来？
1 元素数组的手动复制比 System.arraycopy 更快。我不清楚为什么。有人知道吗？

虚拟机版本：JDK1.8.0_131，虚拟机 25.131-b11

Answer 1

您的 System.arraycopy 基准在语义上不等同于 Arrays.copyOf。

如果你替换它就会

    System.arraycopy(ar, 0, result, 0, length);

和

    System.arraycopy(ar, 0, result, 0, Math.min(ar.length, length));

随着这一变化，两个基准测试的性能也将变得相似。

为什么第一个变体那么慢？

不知道 length 与 ar.length 的关系 JVM 需要执行额外的边界检查并准备在 length > ar.length 时抛出 IndexOutOfBoundsException。
这也打破了消除冗余归零的优化。你知道，每个分配的数组都必须用零初始化。但是，如果 JIT 发现数组在创建后立即被填充，则它可以避免清零。但是 -prof perfasm 清楚地表明原始 System.arraycopy 基准测试花费了大量时间来清除分配的数组：
```
 0,84%    0x000000000365d35f: shr    [=12=]x3,%rcx
 0,06%    0x000000000365d363: add    [=12=]xfffffffffffffffe,%rcx
 0,69%    0x000000000365d367: xor    %rax,%rax
          0x000000000365d36a: shl    [=12=]x3,%rcx
21,02%    0x000000000365d36e: rep rex.W stos %al,%es:(%rdi)  ;*newarray
```

对于小型数组，手动复制似乎更快，因为与 System.arraycopy 不同，它不执行任何对 VM 函数的运行时调用。

为什么 Arrays.copyOf 比 System.arraycopy 小数组快 2 倍？

Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

java

arrays

performance

microbenchmark