测量执行时间 ECLiPSe CLP（或 Prolog）

Question

如何测量 ECLiPSe CLP 中方法的执行时间？目前，我有这个：

measure_traditional(Difficulty,Selection,Choice):-
  statistics(runtime, _),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  time(solve_traditional(Difficulty,Selection,Choice,_)),
  statistics(runtime,[_|T]), % T 
  write(T).

我需要写下执行方法 solve_traditional(...) 所花费的时间并将其写到文本文件中。然而，它不够精确。有时 time 会为给定方法打印 0.015 或 0.016 秒，但通常它会打印 0.0 秒。

发现该方法完成得太快，我决定使用 statistics(runtime, ...) 来测量两次运行时调用之间花费的时间。例如，我可以测量完成 20 次方法调用所需的时间，并将测量时间 T 除以 20。

唯一的问题是，20 次调用 T 等于 0、16、32 或 48 毫秒。显然，它分别测量每个方法调用的时间并计算执行时间的总和（通常仅为 0.0 秒）。这超出了测量 N 个方法调用的运行时间并将时间 T 除以 N 的全部目的。

简而言之：我当前用于执行时间测量的方法不够充分。有没有办法让它更精确（例如 9 位小数）？

Answer 1

基准测试在任何编程语言中都是一项棘手的工作，尤其是在 CLP 中。尤其是如果您打算发布您的结果，您应该非常彻底并绝对确保您正在测量您声称要测量的内容。

Timers: Are you measuring real time, process cpu time, thread cpu time? Including time spent in system calls? Including or excluding garbage collection? ...

查看 statistics/2 原语提供的不同计时器。有一个实时高分辨率计时器，可以通过 statistics(hr_time,T).

访问

Timer resolution: In your example the timer resolution seems to be 1/60 sec. That means, to get 3 significant digits in your time measurement, you have to measure at least a runtime of 1000*1/60 = 16.7 seconds.

如果您的基准运行时间太短，您必须运行多次。

Runtime variance: On modern machines it is increasingly difficult to get reproducible timings. This is due to effects that have nothing to do with the program you are measuring, such as cache behaviour, paging, context switches, power management hardware, memory alignment, etc.

运行足够的重复，运行在安静的机器上，确保你的结果是可重现的。

Repeating benchmarks: In a system like ECLiPSe, running benchmarks repeatedly must be done carefully to ensure that the successive runs really do the same computation, and ideally have same or similar cache and garbage collection behaviour.

在您的代码中，您运行基准连续地结合在一起。不建议这样做，因为变量实例化、延迟的目标或垃圾可以从前面的运行s 中幸存下来，并减慢或加快后续的运行s。如上所述，您可以使用模式

run_n_times(N,Goal) :- \+ ( between(1,N,1,_), \+ Goal ).

这本质上是一种重复 N 次序列的方法

once(Goal), fail

这里的要点是 once/1 和 fail 的组合取消了 Goal 的所有计算，因此下一次迭代尽可能从类似的开始机状态。不幸的是，这个撤消过程本身会增加额外的运行时间，这会扭曲测量...

Test overheads: If you run your benchmark several times, you need a test framework that does that for you, and this contributes to the runtime you measure.

您要么必须确保开销可以忽略不计，要么必须测量开销（例如，通过运行使用虚拟基准测试框架）并减去它，例如：

benchmark(N, DummyGoal, Goal, Time) :-
    cputime(T1),
    run_n_times(N, DummyGoal),
    cputime(T2),
    run_n_times(N, Goal),
    cputime(T3),
    Time is (T3-T2)-(T2-T1).

CLP specifics: There are many other considerations specific to the kind of data-driven operations that occur in CLP solvers, and which make CLP runtimes very difficult to compare. These solvers have many internal degrees of freedom regarding scheduling of propagators, degrees of pruning, tie breaking rules in search control, etc.

一篇专门讨论这些事情的论文是： 关于基准测试约束逻辑编程平台，作者：Mark Wallace、Joachim Schimpf、Kish Shen 和 Warwick Harvey。在 CONSTRAINTS Journal，编辑。 E.C。 Freuder,9(1), pp 5-34, Kluwer, 2004.

测量执行时间 ECLiPSe CLP（或 Prolog）

Measuring execution time ECLiPSe CLP (or Prolog)

performance

prolog

performance-testing

eclipse-clp