IntStream导致数组元素被错误设置为0(JVM Bug,Java11)
IntStream leads to array elements being wrongly set to 0 (JVM Bug, Java 11)
在下面的classP
中,方法test
似乎return完全相同false
:
import java.util.function.IntPredicate;
import java.util.stream.IntStream;
public class P implements IntPredicate {
private final static int SIZE = 33;
@Override
public boolean test(int seed) {
int[] state = new int[SIZE];
state[0] = seed;
for (int i = 1; i < SIZE; i++) {
state[i] = state[i - 1];
}
return seed != state[SIZE - 1];
}
public static void main(String[] args) {
long count = IntStream.range(0, 0x0010_0000).filter(new P()).count();
System.out.println(count);
}
}
将 class P
与 IntStream
组合,然而,方法 test
可以(错误地)return true
。
上面 main
方法中的代码会产生一些正整数,例如 716208
。
每次执行后结果都会改变。
此意外行为 的发生是因为 int
数组 state[]
可以在执行期间设置为零。
如果是测试代码,比如
if (seed == 0xf_fff0){
System.out.println(Arrays.toString(state));
}
被插入到方法test
的尾部,那么程序会输出这样一行[1048560, 1048560, 1048560, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
.
问题:为什么int数组state[]
可以置零?
我已经知道如何避免这种行为:只需将 int[]
替换为 ArrayList
。
我检查了:
- windows 10+ 和 debian 10+ with OpenJDK Runtime Environment (build 15.0.1+9-18) OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing )
- debian 9 + OpenJDK 运行时环境 AdoptOpenJDK (build 13.0.1+9) OpenJDK 64 位服务器 VM AdoptOpenJDK (build 13.0.1+9, 混合模式, 共享)
可以用一个更简单的例子重现这个问题,即:
class Main {
private final static int SIZE = 33;
public static boolean test2(int seed) {
int[] state = new int[SIZE];
state[0] = seed;
for (int i = 1; i < SIZE; i++) {
state[i] = state[i - 1];
}
return seed != state[SIZE - 1];
}
public static void main(String[] args) {
long count = IntStream.range(0, 0x0010_0000).filter(Main::test2).count();
System.out.println(count);
}
}
问题是由允许向量化 (SIMD) 循环的 JVM
优化标志引起的(即、-XX:+AllowVectorizeOnDemand
)。可能是由于对具有相交范围(即 state[i] = state[i - 1];
)的同一数组应用矢量化而引起的。如果 JVM
会(对于 IntStream.range(0, 0x0010_0000)
的某些元素)优化循环,则可能会重现类似的问题:
for (int i = 1; i < SIZE; i++)
state[i] = state[i - 1];
进入:
System.arraycopy(state, 0, state, 1, SIZE - 1);
例如:
class Main {
private final static int SIZE = 33;
public static boolean test2(int seed) {
int[] state = new int[SIZE];
state[0] = seed;
System.arraycopy(state, 0, state, 1, SIZE - 1);
if(seed == 100)
System.out.println(Arrays.toString(state));
return seed != state[SIZE - 1];
}
public static void main(String[] args) {
long count = IntStream.range(0, 0x0010_0000).filter(Main::test2).count();
System.out.println(count);
}
}
输出:
[100, 100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
最新更新:2021 年 1 月 1 日
我已经向参与该标志 implementation/integration 的开发人员之一发送了一封电子邮件 -XX:+AllowVectorizeOnDemandand
收到了以下回复:
It is known that part of AllowVectorizeOnDemand code is broken.
There was fix (it excluded executing broken code which does incorrect
vectorization) which was backported into jdk 11.0.11:
https://hg.openjdk.java.net/jdk-updates/jdk11u-dev/rev/69dbdd271e04
If you can, try build and test latest OpenJDK11u from
https://hg.openjdk.java.net/jdk-updates/jdk11u-dev/
从第一个link开始,可以读到以下内容:
@bug 8251994
@summary Test vectorization of Streams$RangeIntSpliterator::forEachRemaining
@requires vm.compiler2.enabled & vm.compMode != "Xint"
@run main compiler.vectorization.TestForEachRem test1
@run main compiler.vectorization.TestForEachRem test2
@run main compiler.vectorization.TestForEachRem test3
@run main compiler.vectorization.TestForEachRem test4
从关于该错误的 JIRA story 的评论中,可以阅读:
I found the cause of the issue. To improve a chance to vectorize a
loop, superword tries to hoist loads to the beginning of loop by
replacing their memory input with corresponding (same memory slice)
loop's memory Phi :
http://hg.openjdk.java.net/jdk/jdk/file/8f73aeccb27c/src/hotspot/share/opto/superword.cpp#l471
Originally loads are ordered by corresponding stores on the same
memory slice. But when they are hoisted they loose that ordering -
nothing enforce the order. In test6 case the ordering is preserved
(luckily?) after hoisting only when vector size is 32 bytes (avx2) but
they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes
vectors.
(...)
I have simple fix (use original loads ordering indexes) but looking on
the code which causing the issue I see that it is bogus/incomplete -
it does not help cases listed for JDK-8076284 changes:
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html
Using unrolling and cloning information to vectorize is interesting
idea but as I see it is not complete. Even if pack_parallel() method
is able created packs they are all removed by filter_packs() method.
And additionally the above cases are vectorized without hoisting loads
and pack_parallel - I verified it. That code is useless now and I
will put it under flag to not run it. It needs more work to be useful.
I reluctant to remove the code because may be in a future we will have
time to invest into it.
这可能解释了为什么当我比较带有和不带有标志 -XX:+AllowVectorizeOnDemand
的版本的程序集时,我注意到带有标志的版本用于以下代码:
for (int i = 1; i < SIZE; i++)
state[i] = state[i - 1];
(我提取了一个名为 hotstop
的方法以方便在程序集中查找它),有:
00000001162bacf5: mov %r8d,0x10(%rsi,%r10,4)
0x00000001162bacfa: mov %r8d,0x14(%rsi,%r10,4)
0x00000001162bacff: mov %r8d,0x18(%rsi,%r10,4)
0x00000001162bad04: mov %r8d,0x1c(%rsi,%r10,4)
0x00000001162bad09: mov %r8d,0x20(%rsi,%r10,4)
0x00000001162bad0e: mov %r8d,0x24(%rsi,%r10,4)
0x00000001162bad13: mov %r8d,0x28(%rsi,%r10,4)
0x00000001162bad18: mov %r8d,0x2c(%rsi,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
; - AAAAAA.Main::hotstop@15 (line 21)
这在我看来像一个循环 unrolling
,另一方面,方法 java.util.stream.Streams$RangeIntSpliterator::forEachRemaining
只出现在带有标志的版本的汇编中。
在下面的classP
中,方法test
似乎return完全相同false
:
import java.util.function.IntPredicate;
import java.util.stream.IntStream;
public class P implements IntPredicate {
private final static int SIZE = 33;
@Override
public boolean test(int seed) {
int[] state = new int[SIZE];
state[0] = seed;
for (int i = 1; i < SIZE; i++) {
state[i] = state[i - 1];
}
return seed != state[SIZE - 1];
}
public static void main(String[] args) {
long count = IntStream.range(0, 0x0010_0000).filter(new P()).count();
System.out.println(count);
}
}
将 class P
与 IntStream
组合,然而,方法 test
可以(错误地)return true
。
上面 main
方法中的代码会产生一些正整数,例如 716208
。
每次执行后结果都会改变。
此意外行为 的发生是因为 int
数组 state[]
可以在执行期间设置为零。
如果是测试代码,比如
if (seed == 0xf_fff0){
System.out.println(Arrays.toString(state));
}
被插入到方法test
的尾部,那么程序会输出这样一行[1048560, 1048560, 1048560, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
.
问题:为什么int数组state[]
可以置零?
我已经知道如何避免这种行为:只需将 int[]
替换为 ArrayList
。
我检查了:
- windows 10+ 和 debian 10+ with OpenJDK Runtime Environment (build 15.0.1+9-18) OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing )
- debian 9 + OpenJDK 运行时环境 AdoptOpenJDK (build 13.0.1+9) OpenJDK 64 位服务器 VM AdoptOpenJDK (build 13.0.1+9, 混合模式, 共享)
可以用一个更简单的例子重现这个问题,即:
class Main {
private final static int SIZE = 33;
public static boolean test2(int seed) {
int[] state = new int[SIZE];
state[0] = seed;
for (int i = 1; i < SIZE; i++) {
state[i] = state[i - 1];
}
return seed != state[SIZE - 1];
}
public static void main(String[] args) {
long count = IntStream.range(0, 0x0010_0000).filter(Main::test2).count();
System.out.println(count);
}
}
问题是由允许向量化 (SIMD) 循环的 JVM
优化标志引起的(即、-XX:+AllowVectorizeOnDemand
)。可能是由于对具有相交范围(即 state[i] = state[i - 1];
)的同一数组应用矢量化而引起的。如果 JVM
会(对于 IntStream.range(0, 0x0010_0000)
的某些元素)优化循环,则可能会重现类似的问题:
for (int i = 1; i < SIZE; i++)
state[i] = state[i - 1];
进入:
System.arraycopy(state, 0, state, 1, SIZE - 1);
例如:
class Main {
private final static int SIZE = 33;
public static boolean test2(int seed) {
int[] state = new int[SIZE];
state[0] = seed;
System.arraycopy(state, 0, state, 1, SIZE - 1);
if(seed == 100)
System.out.println(Arrays.toString(state));
return seed != state[SIZE - 1];
}
public static void main(String[] args) {
long count = IntStream.range(0, 0x0010_0000).filter(Main::test2).count();
System.out.println(count);
}
}
输出:
[100, 100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
最新更新:2021 年 1 月 1 日
我已经向参与该标志 implementation/integration 的开发人员之一发送了一封电子邮件 -XX:+AllowVectorizeOnDemandand
收到了以下回复:
It is known that part of AllowVectorizeOnDemand code is broken.
There was fix (it excluded executing broken code which does incorrect vectorization) which was backported into jdk 11.0.11:
https://hg.openjdk.java.net/jdk-updates/jdk11u-dev/rev/69dbdd271e04
If you can, try build and test latest OpenJDK11u from https://hg.openjdk.java.net/jdk-updates/jdk11u-dev/
从第一个link开始,可以读到以下内容:
@bug 8251994 @summary Test vectorization of Streams$RangeIntSpliterator::forEachRemaining @requires vm.compiler2.enabled & vm.compMode != "Xint"
@run main compiler.vectorization.TestForEachRem test1 @run main compiler.vectorization.TestForEachRem test2 @run main compiler.vectorization.TestForEachRem test3 @run main compiler.vectorization.TestForEachRem test4
从关于该错误的 JIRA story 的评论中,可以阅读:
I found the cause of the issue. To improve a chance to vectorize a loop, superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : http://hg.openjdk.java.net/jdk/jdk/file/8f73aeccb27c/src/hotspot/share/opto/superword.cpp#l471
Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. In test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. (...)
I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes:
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html
Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are all removed by filter_packs() method. And additionally the above cases are vectorized without hoisting loads and pack_parallel - I verified it. That code is useless now and I will put it under flag to not run it. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it.
这可能解释了为什么当我比较带有和不带有标志 -XX:+AllowVectorizeOnDemand
的版本的程序集时,我注意到带有标志的版本用于以下代码:
for (int i = 1; i < SIZE; i++)
state[i] = state[i - 1];
(我提取了一个名为 hotstop
的方法以方便在程序集中查找它),有:
00000001162bacf5: mov %r8d,0x10(%rsi,%r10,4)
0x00000001162bacfa: mov %r8d,0x14(%rsi,%r10,4)
0x00000001162bacff: mov %r8d,0x18(%rsi,%r10,4)
0x00000001162bad04: mov %r8d,0x1c(%rsi,%r10,4)
0x00000001162bad09: mov %r8d,0x20(%rsi,%r10,4)
0x00000001162bad0e: mov %r8d,0x24(%rsi,%r10,4)
0x00000001162bad13: mov %r8d,0x28(%rsi,%r10,4)
0x00000001162bad18: mov %r8d,0x2c(%rsi,%r10,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
; - AAAAAA.Main::hotstop@15 (line 21)
这在我看来像一个循环 unrolling
,另一方面,方法 java.util.stream.Streams$RangeIntSpliterator::forEachRemaining
只出现在带有标志的版本的汇编中。