带尾递归的慢字节码
slow byte code with tail recursion
受到 this SO question 答案的启发,我使用代码检查尾递归的命令式循环:
let rec nothingfunc i =
match i with
| 1000000000 -> 1
| _ -> nothingfunc (i+1)
let nothingloop1 () =
let i = ref 0 in
while !i < 1000000000 do incr i done;
1
let timeit f v =
let t1 = Unix.gettimeofday() in
let _ = f v in
let t2 = Unix.gettimeofday() in
t2 -. t1
let () =
Printf.printf "recursive function: %g s\n%!" (timeit nothingfunc 0);
Printf.printf "while loop with ref counter buitin incr: %g s\n%!" (timeit nothingloop1 ());
对于字节码和本机代码,结果是
str@s131-intel:~> ./bench_loop
recursive function: 20.7656 s
while loop with ref counter buitin incr: 12.0642 s
str@s131-intel:~> ./bench_loop.opt
recursive function: 0.755594 s
while loop with ref counter buitin incr: 0.753947 s
问题是:20秒到12秒的执行时间差异很大的原因是什么?
编辑,我的结论:
函数调用apply
(以字节代码表示)涉及堆栈大小检查、可能的堆栈扩大和信号检查。为了获得最佳性能,本机代码编译器将提供。
(旁注:在 SO 上提问,因为它对搜索引擎友好。)
查看ocamlfind ocamlc -package unix test.ml -dlambda
的输出
(nothingloop1/1010 =
(function param/1022
(let (i/1011 =v 0)
(seq (while (< i/1011 100000000) (assign i/1011 (1+ i/1011))) 1)))
(nothingfunc/1008
(function i/1009
(if (!= i/1009 100000000) (apply nothingfunc/1008 (+ i/1009 1)) 1)))
显然 assign
比 apply
快。似乎在函数调用时检查堆栈溢出和信号,但不检查简单的分配。详情你得看:https://github.com/ocaml/ocaml/blob/trunk/byterun/interp.c
受到 this SO question 答案的启发,我使用代码检查尾递归的命令式循环:
let rec nothingfunc i =
match i with
| 1000000000 -> 1
| _ -> nothingfunc (i+1)
let nothingloop1 () =
let i = ref 0 in
while !i < 1000000000 do incr i done;
1
let timeit f v =
let t1 = Unix.gettimeofday() in
let _ = f v in
let t2 = Unix.gettimeofday() in
t2 -. t1
let () =
Printf.printf "recursive function: %g s\n%!" (timeit nothingfunc 0);
Printf.printf "while loop with ref counter buitin incr: %g s\n%!" (timeit nothingloop1 ());
对于字节码和本机代码,结果是
str@s131-intel:~> ./bench_loop
recursive function: 20.7656 s
while loop with ref counter buitin incr: 12.0642 s
str@s131-intel:~> ./bench_loop.opt
recursive function: 0.755594 s
while loop with ref counter buitin incr: 0.753947 s
问题是:20秒到12秒的执行时间差异很大的原因是什么?
编辑,我的结论:
函数调用apply
(以字节代码表示)涉及堆栈大小检查、可能的堆栈扩大和信号检查。为了获得最佳性能,本机代码编译器将提供。
(旁注:在 SO 上提问,因为它对搜索引擎友好。)
查看ocamlfind ocamlc -package unix test.ml -dlambda
(nothingloop1/1010 =
(function param/1022
(let (i/1011 =v 0)
(seq (while (< i/1011 100000000) (assign i/1011 (1+ i/1011))) 1)))
(nothingfunc/1008
(function i/1009
(if (!= i/1009 100000000) (apply nothingfunc/1008 (+ i/1009 1)) 1)))
显然 assign
比 apply
快。似乎在函数调用时检查堆栈溢出和信号,但不检查简单的分配。详情你得看:https://github.com/ocaml/ocaml/blob/trunk/byterun/interp.c