为什么 Hotspot JIT 不对长计数器执行循环展开?

Why doesn't Hotspot JIT perform loop unrolling for long counters?

我刚刚阅读了 Java 杂志文章 Loop Unrolling。在那里,作者证明了带有 int 计数器的简单 for 循环是使用循环展开优化编译的:

private long intStride1()
{
    long sum = 0;
    for (int i = 0; i < MAX; i += 1)
    {
        sum += data[i];
    }
    return sum;
}

然而,他们随后通过将计数器类型切换为 long:

来表明一切都发生了变化
private long longStride1()
{
    long sum = 0;
    for (long l = 0; l < MAX; l++)
    {
        sum += data[(int) l];
    }
    return sum;
}

这将输出程序集更改为:

  1. 引入安全点
  2. 不执行展开

这会显着降低吞吐量性能。

为什么 64 位 HotSpot VM 不为 long 计数器执行循环展开?为什么第二种情况需要安全点,而第一种情况不需要?

自 JDK16 起,HotSpot JVM 支持使用 64 位计数器对循环展开和其他优化。

JDK-8223051 的描述回答了您的两个问题:

Many core loop transformations apply to counted loops, which are those with a calculated trip count. Transformations include unrolling, iteration range splitting (array RCE), and strip mining (JDK-8186027). The optimizer performs many complicated pattern matches to detect and transform counted loop.

Most or all of these pattern matches and transformations apply to loops with 32-bit control variables and arithmetic. This makes sense as long as bulk operations apply only to Java arrays, since those arrays can only span a 31-bit index range. Newer APIs for larger blocks of bulk data will introduce 64-bit indexes, such as Panama's native arrays and (possibly) range-expanded byte buffers. Under the hood, the Unsafe API routinely works with 64-bit addresses and address arithmetic. Loops which work on such data structures naturally use 64-bit values, either as direct Java longs, or as wrapped cursor structure with incrementing long components (Panama pointers).

There needs to be a story for transforming such long-running loops. This RFE is a request for that story.

A complicating factor is that sometimes counted loops have no safepoints, on the assumption that the largest possible iteration (across 32 bits of dynamic range) won't cause the JVM's safepoint mechanism to malfunction due to a non-responsive thread stuck in such a counted loop. This assumption is invalid in the 64-bit case. Luckily, we have a (relatively new) optimization which can address this problem, by strip-mining a single very long running loop into a sequence (outer loop) of loops of with appropriately bounded trip counts.

那是因为跟踪可能的整数溢出。将 int 转换为 long 并检查 min/max int 时很容易捕获整数溢出,但在大多数平台上没有比 long 更大的类型。 但是由于 JDK 支持 16 个长循环。