Java JIT 是否优化过递归方法调用?
Does the Java JIT ever optimize away recursive method calls?
我知道 Java 尚未添加尾调用消除优化,并打算将其作为对 Project Loom 的后期添加。相反,我的问题是:JIT 是否曾经完全优化掉递归方法调用并将它们转换为迭代形式?这在名义上似乎是可能的,但相对困难,所以我猜如果他们这样做了,一些文档会对其进行强烈描述,但我正在努力追踪有关该主题的任何内容。
作为后续行动,如果 JIT 确实以某种形式消除了 Loom 中描述的递归调用,那么它如何显示在堆栈跟踪中?
你的问题“它是如何出现在堆栈跟踪中的?”是 JVM 中尾调用优化的未解决问题之一。人们期望 Java 代码的可管理性,特别是当代码在递归算法中花费很长时间(甚至挂在无休止的递归中)并且开发人员想要找出正在发生的事情时。
所以简而言之,JVM(HotSpot 的当前版本)没有尾调用优化。但是它能够内联有限数量的递归调用,就像它可以内联其他调用一样。
当我们使用
public class Recursion {
public static void main(String[] args) {
runs: for(int run = 0; run < 10; run++) {
for(int i = 1; i < 1_000_000_000; i++) try {
int counted = recursiveCounter(i, 0);
if(i != counted) throw new AssertionError();
} catch(WhosebugError e) {
System.out.println("limit reached at " + i);
continue runs;
}
}
}
private static int recursiveCounter(int limit, int count) {
return limit == 0? count: recursiveCounter(limit - 1, count + 1);
}
}
我们得到的值与 Why is the max recursion depth I can reach non-deterministic? 运行 -XX:CompileCommand=print,Recursion.recursiveCounter
中讨论的值相似
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
打印
@ 18 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) recursive inlining too deep
和
0x0000026398871220: mov dword ptr [rsp+0ffffffffffff9000h],eax
0x0000026398871227: push rbp
0x0000026398871228: sub rsp,10h ;*synchronization entry
; - Recursion::recursiveCounter@-1 (line 16)
0x000002639887122c: test edx,edx
0x000002639887122e: je 26398871254h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
0x0000026398871230: cmp edx,1h
0x0000026398871233: je 26398871259h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871235: add r8d,2h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871239: add edx,0fffffffeh ;*isub {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@10 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x000002639887123c: nop
0x000002639887123f: call 26398871220h ; ImmutableOopMap {}
;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {static_call}
0x0000026398871244: add rsp,10h
0x0000026398871248: pop rbp
0x0000026398871249: mov r10,qword ptr [r15+110h]
0x0000026398871250: test dword ptr [r10],eax ; {poll_return}
0x0000026398871253: ret
0x0000026398871254: mov eax,r8d
0x0000026398871257: jmp 26398871244h
0x0000026398871259: mov eax,r8d
0x000002639887125c: inc eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
0x000002639887125e: jmp 26398871244h ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871260: mov rdx,rax
0x0000026398871263: add rsp,10h
0x0000026398871267: pop rbp
0x0000026398871268: jmp 26390ea4d00h ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@17 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {runtime_call _rethrow_Java}
有趣的部分是 test edx,edx
指令匹配我们的 limit == 0
条件。如果这个条件不满足,下面有一个 cmp edx,1h
测试,有效地测试下一个递归调用将测试什么,如果这个条件也不满足,代码将执行 add r8d,2h
和 add edx,0fffffffeh
,将 count
递增 2,将 limit
递减 2。
所以我们清楚地看到了内联一层递归和融合操作的效果。内联诊断告诉我们已经超过限制,因此值得研究指定时发生的情况,例如-XX:MaxRecursiveInlineLevel=4
:
@ 18 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) recursive inlining too deep
0x0000025345d31620: mov dword ptr [rsp+0ffffffffffff9000h],eax
0x0000025345d31627: push rbp
0x0000025345d31628: sub rsp,10h ;*synchronization entry
; - Recursion::recursiveCounter@-1 (line 16)
0x0000025345d3162c: test edx,edx
0x0000025345d3162e: je 25345d31660h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
0x0000025345d31630: cmp edx,1h
0x0000025345d31633: je 25345d31665h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31635: cmp edx,2h
0x0000025345d31638: je 25345d3166ch ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3163a: cmp edx,3h
0x0000025345d3163d: je 25345d31674h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3163f: cmp edx,4h
0x0000025345d31642: je 25345d3167ch ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31644: add r8d,5h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31648: add edx,0fffffffbh
0x0000025345d3164b: call 25345d31620h ; ImmutableOopMap {}
;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {static_call}
0x0000025345d31650: add rsp,10h
0x0000025345d31654: pop rbp
0x0000025345d31655: mov r10,qword ptr [r15+110h]
0x0000025345d3165c: test dword ptr [r10],eax ; {poll_return}
0x0000025345d3165f: ret
0x0000025345d31660: mov eax,r8d
0x0000025345d31663: jmp 25345d31650h
0x0000025345d31665: mov eax,r8d
0x0000025345d31668: inc eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
0x0000025345d3166a: jmp 25345d31650h
0x0000025345d3166c: mov eax,r8d
0x0000025345d3166f: add eax,2h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31672: jmp 25345d31650h
0x0000025345d31674: mov eax,r8d
0x0000025345d31677: add eax,3h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3167a: jmp 25345d31650h
0x0000025345d3167c: mov eax,r8d
0x0000025345d3167f: add eax,4h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31682: jmp 25345d31650h ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31684: mov rdx,rax
0x0000025345d31687: add rsp,10h
0x0000025345d3168b: pop rbp
0x0000025345d3168c: jmp 2533e364d00h ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@17 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {runtime_call _rethrow_Java}
我们可以看到,编译后的代码已经学会了一次加五。该应用程序还将打印更大限制的嵌套调用。指令后面的注释显示有关嵌套调用的元信息仍然存在,即使没有与调用级别关联的单独指令。这意味着仍然可以构建表示整个原始调用链的堆栈跟踪。
当然,这和实际的尾调用优化是不一样的。无论我们将限制设置多高,它仍然内联有限数量的调用并且已经认识到将限制设置得太高会产生不合理的代码大小。
所以对字面问题“Java JIT 是否曾经优化过递归方法调用?”的答案是“是的,确实如此”。但不是整个递归而是有限次数的调用,换句话说,不是尾调用优化的形式。
我知道 Java 尚未添加尾调用消除优化,并打算将其作为对 Project Loom 的后期添加。相反,我的问题是:JIT 是否曾经完全优化掉递归方法调用并将它们转换为迭代形式?这在名义上似乎是可能的,但相对困难,所以我猜如果他们这样做了,一些文档会对其进行强烈描述,但我正在努力追踪有关该主题的任何内容。
作为后续行动,如果 JIT 确实以某种形式消除了 Loom 中描述的递归调用,那么它如何显示在堆栈跟踪中?
你的问题“它是如何出现在堆栈跟踪中的?”是 JVM 中尾调用优化的未解决问题之一。人们期望 Java 代码的可管理性,特别是当代码在递归算法中花费很长时间(甚至挂在无休止的递归中)并且开发人员想要找出正在发生的事情时。
所以简而言之,JVM(HotSpot 的当前版本)没有尾调用优化。但是它能够内联有限数量的递归调用,就像它可以内联其他调用一样。
当我们使用
public class Recursion {
public static void main(String[] args) {
runs: for(int run = 0; run < 10; run++) {
for(int i = 1; i < 1_000_000_000; i++) try {
int counted = recursiveCounter(i, 0);
if(i != counted) throw new AssertionError();
} catch(WhosebugError e) {
System.out.println("limit reached at " + i);
continue runs;
}
}
}
private static int recursiveCounter(int limit, int count) {
return limit == 0? count: recursiveCounter(limit - 1, count + 1);
}
}
我们得到的值与 Why is the max recursion depth I can reach non-deterministic? 运行 -XX:CompileCommand=print,Recursion.recursiveCounter
中讨论的值相似
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
打印
@ 18 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) recursive inlining too deep
和
0x0000026398871220: mov dword ptr [rsp+0ffffffffffff9000h],eax
0x0000026398871227: push rbp
0x0000026398871228: sub rsp,10h ;*synchronization entry
; - Recursion::recursiveCounter@-1 (line 16)
0x000002639887122c: test edx,edx
0x000002639887122e: je 26398871254h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
0x0000026398871230: cmp edx,1h
0x0000026398871233: je 26398871259h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871235: add r8d,2h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871239: add edx,0fffffffeh ;*isub {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@10 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x000002639887123c: nop
0x000002639887123f: call 26398871220h ; ImmutableOopMap {}
;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {static_call}
0x0000026398871244: add rsp,10h
0x0000026398871248: pop rbp
0x0000026398871249: mov r10,qword ptr [r15+110h]
0x0000026398871250: test dword ptr [r10],eax ; {poll_return}
0x0000026398871253: ret
0x0000026398871254: mov eax,r8d
0x0000026398871257: jmp 26398871244h
0x0000026398871259: mov eax,r8d
0x000002639887125c: inc eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
0x000002639887125e: jmp 26398871244h ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000026398871260: mov rdx,rax
0x0000026398871263: add rsp,10h
0x0000026398871267: pop rbp
0x0000026398871268: jmp 26390ea4d00h ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@17 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {runtime_call _rethrow_Java}
有趣的部分是 test edx,edx
指令匹配我们的 limit == 0
条件。如果这个条件不满足,下面有一个 cmp edx,1h
测试,有效地测试下一个递归调用将测试什么,如果这个条件也不满足,代码将执行 add r8d,2h
和 add edx,0fffffffeh
,将 count
递增 2,将 limit
递减 2。
所以我们清楚地看到了内联一层递归和融合操作的效果。内联诊断告诉我们已经超过限制,因此值得研究指定时发生的情况,例如-XX:MaxRecursiveInlineLevel=4
:
@ 18 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) inline
@ 14 Recursion::recursiveCounter (18 bytes) recursive inlining too deep
0x0000025345d31620: mov dword ptr [rsp+0ffffffffffff9000h],eax
0x0000025345d31627: push rbp
0x0000025345d31628: sub rsp,10h ;*synchronization entry
; - Recursion::recursiveCounter@-1 (line 16)
0x0000025345d3162c: test edx,edx
0x0000025345d3162e: je 25345d31660h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
0x0000025345d31630: cmp edx,1h
0x0000025345d31633: je 25345d31665h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31635: cmp edx,2h
0x0000025345d31638: je 25345d3166ch ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3163a: cmp edx,3h
0x0000025345d3163d: je 25345d31674h ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3163f: cmp edx,4h
0x0000025345d31642: je 25345d3167ch ;*ifne {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@1 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31644: add r8d,5h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31648: add edx,0fffffffbh
0x0000025345d3164b: call 25345d31620h ; ImmutableOopMap {}
;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {static_call}
0x0000025345d31650: add rsp,10h
0x0000025345d31654: pop rbp
0x0000025345d31655: mov r10,qword ptr [r15+110h]
0x0000025345d3165c: test dword ptr [r10],eax ; {poll_return}
0x0000025345d3165f: ret
0x0000025345d31660: mov eax,r8d
0x0000025345d31663: jmp 25345d31650h
0x0000025345d31665: mov eax,r8d
0x0000025345d31668: inc eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
0x0000025345d3166a: jmp 25345d31650h
0x0000025345d3166c: mov eax,r8d
0x0000025345d3166f: add eax,2h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31672: jmp 25345d31650h
0x0000025345d31674: mov eax,r8d
0x0000025345d31677: add eax,3h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d3167a: jmp 25345d31650h
0x0000025345d3167c: mov eax,r8d
0x0000025345d3167f: add eax,4h ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@13 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31682: jmp 25345d31650h ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
0x0000025345d31684: mov rdx,rax
0x0000025345d31687: add rsp,10h
0x0000025345d3168b: pop rbp
0x0000025345d3168c: jmp 2533e364d00h ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
; - Recursion::recursiveCounter@17 (line 16)
; - Recursion::recursiveCounter@14 (line 16)
; {runtime_call _rethrow_Java}
我们可以看到,编译后的代码已经学会了一次加五。该应用程序还将打印更大限制的嵌套调用。指令后面的注释显示有关嵌套调用的元信息仍然存在,即使没有与调用级别关联的单独指令。这意味着仍然可以构建表示整个原始调用链的堆栈跟踪。
当然,这和实际的尾调用优化是不一样的。无论我们将限制设置多高,它仍然内联有限数量的调用并且已经认识到将限制设置得太高会产生不合理的代码大小。
所以对字面问题“Java JIT 是否曾经优化过递归方法调用?”的答案是“是的,确实如此”。但不是整个递归而是有限次数的调用,换句话说,不是尾调用优化的形式。