在 Python vs Erlang 中分析列表反转时的问题

Question

我正在对 Erlang 的 lists:reverse 内置函数 (BIF) 进行性能分析，以了解它与输入大小的比例关系如何。更具体地说，我试过：

1> X = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]

2> timer:tc(lists, reverse, [X]).
{57737,
 [1000000,999999,999998,999997,999996,999995,999994,999993,
  999992,999991,999990,999989,999988,999987,999986,999985,
  999984,999983,999982,999981,999980,999979,999978,999977,
  999976,999975,999974|...]}

3> timer:tc(lists, reverse, [X]).
{46896,
 [1000000,999999,999998,999997,999996,999995,999994,999993,
  999992,999991,999990,999989,999988,999987,999986,999985,
  999984,999983,999982,999981,999980,999979,999978,999977,
  999976,999975,999974|...]}

4> Y = lists:seq(1, 10000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]

5> timer:tc(lists, reverse, [Y]).
{434079,
 [10000000,9999999,9999998,9999997,9999996,9999995,9999994,
  9999993,9999992,9999991,9999990,9999989,9999988,9999987,
  9999986,9999985,9999984,9999983,9999982,9999981,9999980,
  9999979,9999978,9999977,9999976,9999975,9999974|...]}

6> timer:tc(lists, reverse, [Y]).
{214173,
 [10000000,9999999,9999998,9999997,9999996,9999995,9999994,
  9999993,9999992,9999991,9999990,9999989,9999988,9999987,
  9999986,9999985,9999984,9999983,9999982,9999981,9999980,
  9999979,9999978,9999977,9999976,9999975,9999974|...]}

好的，到目前为止，反向 BIF 似乎在相对于输入的近似线性时间内缩放（例如，将输入的大小乘以 10，所用时间的大小也增加了 10 倍）。在纯 Erlang 中这是有意义的，因为我们会使用尾递归之类的东西来反转列表。我想即使是在 C 中实现的 BIF，反转列表的算法似乎也是相同的（可能是因为列表在 Erlang 中的表示方式？）。

现在我想将它与另一种语言进行比较——也许是我已经使用的另一种动态类型语言。所以我在 Python 中尝试了类似的事情——非常明确地注意使用实际列表而不是生成器，我预计这会在这个测试中对 Python 的性能产生积极影响，给它一个不公平的优势。

import time

ms_conv_factor = 10**6

def profile(func, *args):
    start = time.time()
    func(args)
    end = time.time()
    elapsed_seconds = end - start
    print(elapsed_seconds * ms_conv_factor, flush=True)

x = list([i for i in range(0, 1000000)])
y = list([i for i in range(0, 10000000)])
z = list([i for i in range(0, 100000000)])

def f(m):
    return m[::-1]

def g(m):
    return reversed(m)

if __name__ == "__main__":
    print("All done loading the lists, starting now.", flush=True)
    print("f:")
    profile(f, x)
    profile(f, y)
    print("")
    profile(f, x)
    profile(f, y)
    print("")
    profile(f, z)

    print("")

    print("g:")
    profile(g, x)
    profile(g, y)
    print("")
    profile(g, x)
    profile(g, y)
    print("")
    profile(g, z)

这似乎表明在函数加载后运行一次，输入的长度没有区别，反转时间非常快 - 在 ~0.7µs 的范围内。

确切结果：

All done loading the lists, starting now.
f:
1.430511474609375
0.7152557373046875

0.7152557373046875
0.2384185791015625

0.476837158203125

g:
1.9073486328125
0.7152557373046875

0.2384185791015625
0.2384185791015625

0.476837158203125

我的第一个天真的猜测是 python 可能能够识别反向构造并创建类似反向迭代器的东西，并且 return 那（Python 可以使用引用对吧？也许它在这里使用了某种优化）。但我认为这个理论没有意义，因为原始列表和 returned 列表不一样（改变一个不应该改变另一个）。

所以我的问题是：

我的分析技术有缺陷吗？我是否以一种有利于一种语言而不是另一种语言的方式编写测试？
在 Erlang 和 Python 中列表的实现及其反转有什么不同，这使得这种情况（Python 更快）成为可能？

感谢您抽出时间（提前）。

Answer 1

This seems to suggest that after the function has been loaded and run once, the length of the input makes no difference and the reversal times are incredibly fast - in the range of ~0.7µs.

因为你的分析功能不正确。它接受可变位置参数，但是当它把它们传递给函数时，它不会解压它们所以你只曾经使用长度为 1 的元组。 您需要执行以下操作：

def profile(func, *args):
    start = time.time()
    func(*args) # Make sure to unpack the args!
    end = time.time()
    elapsed_seconds = end - start
    print(elapsed_seconds * ms_conv_factor, flush=True)

注意区别：

>>> def foo(*args):
...    print(args)
...    print(*args)
...
>>> foo(1,2,3)
(1, 2, 3)
1 2 3

另请注意，reversed(m) 创建了一个 reversed 迭代器，因此在您对其进行迭代之前它实际上不会执行任何操作。所以 g 仍然是常数时间。

但请放心，在 Python 中反转列表需要线性时间。

在 Python vs Erlang 中分析列表反转时的问题

Issues when profiling list reversal in Python vs Erlang

python

erlang

implementation

profiling

list