Prolog 是否有像 Haskell 这样的别名 "operator"?
Does Prolog have an alias "operator" like Haskell?
在Haskell中,有一个语言特性叫做"as"-运算符(有时称为别名)。这个想法如下:假设你有一个函数,例如将一个列表作为输入并想要 return 所有尾巴,你可以将其实现为:
tails a@(_:xs) = a : tails xs
tails [] = [[]]
@
确保您既引用了整个参数,也引用了参数结构的某些部分。这是第一行正文中的智能 performance-wise(它更像是一种性能 hack,因为重建了一个数组 (x:xs)
),如果编译器未对其进行优化,将会导致分配一个新的 object、修改字段等。有关详细信息,请参阅 here。
我想知道 Prolog 是否有等效的东西:例如,如果你想在 Prolog 中实现尾部,可以通过以下方式完成:
tails([H|T],[[H|T]|TA]) :-
tails(T,TA).
tails([],[[]]).
但是如果有一个 "as"-operator 像这样的话它会更有效率:
tails(L@[_|T],[L|TA]) :- %This does not compile
tails(T,TA).
tails([],[[]]).
是否有任何这样的构造或语言扩展?
TL;DR: 好主意1!加速似乎限制在 ~20%(对于大多数列表大小)。
在这个答案中,我们比较了三个不同的谓词,它们在 @
类数据重用方面有所不同:
list_tails([], [[]]). % (1) like `tails/2` given by the OP ...
list_tails([E|Es], [[E|Es]|Ess]) :- % ....... but with a better name :-)
list_tails(Es, Ess).
list_sfxs1(Es, [Es|Ess]) :- % (2) "re-use, mutual recursion"
aux_list_sfxs1(Es, Ess). % "sfxs" is short for "suffixes"
aux_list_sfxs1([], []).
aux_list_sfxs1([_|Es], Ess) :-
list_sfxs1(Es, Ess).
list_sfxs2([], [[]]). % (3) "re-use, direct recursion"
list_sfxs2(Es0, [Es0|Ess]) :-
Es0 = [_|Es],
list_sfxs2(Es, Ess).
为了测量运行时间,我们使用以下代码:
:-( dif(D,sicstus), current_prolog_flag(dialect,D)
; use_module(library(between))
).
run_benchs(P_2s, P_2, L, N, T_ms) :-
between(1, 6, I),
L is 10^I,
N is 10^(8-I),
length(Xs, L),
member(P_2, P_2s),
garbage_collect,
call_walltime(run_bench_core(P_2,Xs,N), T_ms).
run_bench_core(P_2, Xs, N) :-
between(1, N, _),
call(P_2, Xs, _),
false.
run_bench_core(_, _, _).
测量wall-time2, we utilize call_<a href="https://en.wikipedia.org/wiki/Wall-clock_time" rel="nofollow noreferrer">walltime</a>/2
—a variation of :
call_walltime(G, T_ms) :-
statistics(walltime, [T0|_]),
G,
statistics(walltime, [T1|_]),
T_ms is T1 - T0.
让我们对上面的代码变体进行测试...
- ...使用不同的列表长度
L
...
- ...和 运行 每个测试多次
N
(为了更好的准确性)。
首先,我们使用swi-prolog版本7.3.14(64位):
?- run_benchs([list_sfxs1,list_sfxs2,list_tails], P_2, L, N, T_ms).
P_2 = list_sfxs1, L*N = 10*10000000, T_ms = 7925
; P_2 = list_sfxs2, L*N = 10*10000000, T_ms = 7524
; P_2 = list_tails, L*N = 10*10000000, T_ms = 6936
;
P_2 = list_sfxs1, L*N = 100*1000000, T_ms = 6502
; P_2 = list_sfxs2, L*N = 100*1000000, T_ms = 5861
; P_2 = list_tails, L*N = 100*1000000, T_ms = 5618
;
P_2 = list_sfxs1, L*N = 1000*100000, T_ms = 6434
; P_2 = list_sfxs2, L*N = 1000*100000, T_ms = 5817
; P_2 = list_tails, L*N = 1000*100000, T_ms = 9916
;
P_2 = list_sfxs1, L*N = 10000*10000, T_ms = 6328
; P_2 = list_sfxs2, L*N = 10000*10000, T_ms = 5688
; P_2 = list_tails, L*N = 10000*10000, T_ms = 9442
;
P_2 = list_sfxs1, L*N = 100000*1000, T_ms = 10255
; P_2 = list_sfxs2, L*N = 100000*1000, T_ms = 10296
; P_2 = list_tails, L*N = 100000*1000, T_ms = 14592
;
P_2 = list_sfxs1, L*N = 1000000*100, T_ms = 6955
; P_2 = list_sfxs2, L*N = 1000000*100, T_ms = 6534
; P_2 = list_tails, L*N = 1000000*100, T_ms = 9738.
然后,我们使用 sicstus-prolog 版本 4.3.2(64 位)重复之前的查询3:
?- run_benchs([list_sfxs1,list_sfxs2,list_tails], P_2, L, N, T_ms).
P_2 = list_sfxs1, L*N = 10*10000000, T_ms = 1580
; P_2 = list_sfxs2, L*N = 10*10000000, T_ms = 1610
; P_2 = list_tails, L*N = 10*10000000, T_ms = 1580
;
P_2 = list_sfxs1, L*N = 100*1000000, T_ms = 710
; P_2 = list_sfxs2, L*N = 100*1000000, T_ms = 750
; P_2 = list_tails, L*N = 100*1000000, T_ms = 840
;
P_2 = list_sfxs1, L*N = 1000*100000, T_ms = 650
; P_2 = list_sfxs2, L*N = 1000*100000, T_ms = 660
; P_2 = list_tails, L*N = 1000*100000, T_ms = 740
;
P_2 = list_sfxs1, L*N = 10000*10000, T_ms = 620
; P_2 = list_sfxs2, L*N = 10000*10000, T_ms = 650
; P_2 = list_tails, L*N = 10000*10000, T_ms = 740
;
P_2 = list_sfxs1, L*N = 100000*1000, T_ms = 670
; P_2 = list_sfxs2, L*N = 100000*1000, T_ms = 650
; P_2 = list_tails, L*N = 100000*1000, T_ms = 750
;
P_2 = list_sfxs1, L*N = 1000000*100, T_ms = 12610
; P_2 = list_sfxs2, L*N = 1000000*100, T_ms = 12560
; P_2 = list_tails, L*N = 1000000*100, T_ms = 33460.
总结:
- alias-thingy 可以并且确实显着提高性能。
- 在上述测试中,与 SWI-Prolog 相比,SICStus Prolog jit4 提供 10 倍加速!
脚注 1: 为什么把 (@)/2
放在规则头部的噱头?
以非 idiomatic Prolog 代码结束?
脚注 2: 我们对总运行时间感兴趣。为什么?因为垃圾收集成本显示为更大的数据大小!
脚注 3: 为了便于阅读,答案序列已经过 post 处理。
脚注 4: 自 release 4.3.0. Current target architectures include IA-32 and AMD64 起可用。
在Haskell中,有一个语言特性叫做"as"-运算符(有时称为别名)。这个想法如下:假设你有一个函数,例如将一个列表作为输入并想要 return 所有尾巴,你可以将其实现为:
tails a@(_:xs) = a : tails xs
tails [] = [[]]
@
确保您既引用了整个参数,也引用了参数结构的某些部分。这是第一行正文中的智能 performance-wise(它更像是一种性能 hack,因为重建了一个数组 (x:xs)
),如果编译器未对其进行优化,将会导致分配一个新的 object、修改字段等。有关详细信息,请参阅 here。
我想知道 Prolog 是否有等效的东西:例如,如果你想在 Prolog 中实现尾部,可以通过以下方式完成:
tails([H|T],[[H|T]|TA]) :-
tails(T,TA).
tails([],[[]]).
但是如果有一个 "as"-operator 像这样的话它会更有效率:
tails(L@[_|T],[L|TA]) :- %This does not compile
tails(T,TA).
tails([],[[]]).
是否有任何这样的构造或语言扩展?
TL;DR: 好主意1!加速似乎限制在 ~20%(对于大多数列表大小)。
在这个答案中,我们比较了三个不同的谓词,它们在 @
类数据重用方面有所不同:
list_tails([], [[]]). % (1) like `tails/2` given by the OP ... list_tails([E|Es], [[E|Es]|Ess]) :- % ....... but with a better name :-) list_tails(Es, Ess). list_sfxs1(Es, [Es|Ess]) :- % (2) "re-use, mutual recursion" aux_list_sfxs1(Es, Ess). % "sfxs" is short for "suffixes" aux_list_sfxs1([], []). aux_list_sfxs1([_|Es], Ess) :- list_sfxs1(Es, Ess). list_sfxs2([], [[]]). % (3) "re-use, direct recursion" list_sfxs2(Es0, [Es0|Ess]) :- Es0 = [_|Es], list_sfxs2(Es, Ess).
为了测量运行时间,我们使用以下代码:
:-( dif(D,sicstus), current_prolog_flag(dialect,D) ; use_module(library(between)) ). run_benchs(P_2s, P_2, L, N, T_ms) :- between(1, 6, I), L is 10^I, N is 10^(8-I), length(Xs, L), member(P_2, P_2s), garbage_collect, call_walltime(run_bench_core(P_2,Xs,N), T_ms). run_bench_core(P_2, Xs, N) :- between(1, N, _), call(P_2, Xs, _), false. run_bench_core(_, _, _).
测量wall-time2, we utilize call_<a href="https://en.wikipedia.org/wiki/Wall-clock_time" rel="nofollow noreferrer">walltime</a>/2
—a variation of
call_walltime(G, T_ms) :- statistics(walltime, [T0|_]), G, statistics(walltime, [T1|_]), T_ms is T1 - T0.
让我们对上面的代码变体进行测试...
- ...使用不同的列表长度
L
... - ...和 运行 每个测试多次
N
(为了更好的准确性)。
首先,我们使用swi-prolog版本7.3.14(64位):
?- run_benchs([list_sfxs1,list_sfxs2,list_tails], P_2, L, N, T_ms). P_2 = list_sfxs1, L*N = 10*10000000, T_ms = 7925 ; P_2 = list_sfxs2, L*N = 10*10000000, T_ms = 7524 ; P_2 = list_tails, L*N = 10*10000000, T_ms = 6936 ; P_2 = list_sfxs1, L*N = 100*1000000, T_ms = 6502 ; P_2 = list_sfxs2, L*N = 100*1000000, T_ms = 5861 ; P_2 = list_tails, L*N = 100*1000000, T_ms = 5618 ; P_2 = list_sfxs1, L*N = 1000*100000, T_ms = 6434 ; P_2 = list_sfxs2, L*N = 1000*100000, T_ms = 5817 ; P_2 = list_tails, L*N = 1000*100000, T_ms = 9916 ; P_2 = list_sfxs1, L*N = 10000*10000, T_ms = 6328 ; P_2 = list_sfxs2, L*N = 10000*10000, T_ms = 5688 ; P_2 = list_tails, L*N = 10000*10000, T_ms = 9442 ; P_2 = list_sfxs1, L*N = 100000*1000, T_ms = 10255 ; P_2 = list_sfxs2, L*N = 100000*1000, T_ms = 10296 ; P_2 = list_tails, L*N = 100000*1000, T_ms = 14592 ; P_2 = list_sfxs1, L*N = 1000000*100, T_ms = 6955 ; P_2 = list_sfxs2, L*N = 1000000*100, T_ms = 6534 ; P_2 = list_tails, L*N = 1000000*100, T_ms = 9738.
然后,我们使用 sicstus-prolog 版本 4.3.2(64 位)重复之前的查询3:
?- run_benchs([list_sfxs1,list_sfxs2,list_tails], P_2, L, N, T_ms). P_2 = list_sfxs1, L*N = 10*10000000, T_ms = 1580 ; P_2 = list_sfxs2, L*N = 10*10000000, T_ms = 1610 ; P_2 = list_tails, L*N = 10*10000000, T_ms = 1580 ; P_2 = list_sfxs1, L*N = 100*1000000, T_ms = 710 ; P_2 = list_sfxs2, L*N = 100*1000000, T_ms = 750 ; P_2 = list_tails, L*N = 100*1000000, T_ms = 840 ; P_2 = list_sfxs1, L*N = 1000*100000, T_ms = 650 ; P_2 = list_sfxs2, L*N = 1000*100000, T_ms = 660 ; P_2 = list_tails, L*N = 1000*100000, T_ms = 740 ; P_2 = list_sfxs1, L*N = 10000*10000, T_ms = 620 ; P_2 = list_sfxs2, L*N = 10000*10000, T_ms = 650 ; P_2 = list_tails, L*N = 10000*10000, T_ms = 740 ; P_2 = list_sfxs1, L*N = 100000*1000, T_ms = 670 ; P_2 = list_sfxs2, L*N = 100000*1000, T_ms = 650 ; P_2 = list_tails, L*N = 100000*1000, T_ms = 750 ; P_2 = list_sfxs1, L*N = 1000000*100, T_ms = 12610 ; P_2 = list_sfxs2, L*N = 1000000*100, T_ms = 12560 ; P_2 = list_tails, L*N = 1000000*100, T_ms = 33460.
总结:
- alias-thingy 可以并且确实显着提高性能。
- 在上述测试中,与 SWI-Prolog 相比,SICStus Prolog jit4 提供 10 倍加速!
脚注 1: 为什么把 (@)/2
放在规则头部的噱头?
以非 idiomatic Prolog 代码结束?
脚注 2: 我们对总运行时间感兴趣。为什么?因为垃圾收集成本显示为更大的数据大小!
脚注 3: 为了便于阅读,答案序列已经过 post 处理。
脚注 4: 自 release 4.3.0. Current target architectures include IA-32 and AMD64 起可用。