在 python 中快速对变量进行排名

Question

我想知道对变量进行排序的最快方法是什么？我有 4 个整数变量，我需要快速对它们进行排序。这个过程需要运行很多很多次，所以它需要很快。我尝试使用计数器和 counter().most_common() 函数，它运行良好但比仅使用单个变量计数要慢。这是我正在运行的示例。

A = 15
B = 10
C = 5
D = 10

def get_highest(A,B,C,D):
    count = A
    label = 'A'
    if B >= count:
        count = B
        label = 'B'
    if C >= count:
        count = C
        label = 'C'
    if D >= count:
        count = D
        label = 'D'

    return count, label

highest, label = get_highest(A,B,C,D)
if label == 'A':
    A=0
if label == 'B':
    B=0
if label == 'C':
    C=0
if label == 'D':
    D=0
second_highest, label = get_highest(A,B,C,D)

我继续，直到我得到所有变量的等级。我想知道是否有更快的方法来做到这一点？我也想在 cython 中实现这一点，因此在 cython 中实现时可以加速的答案将不胜感激。

Answer 1

这里有一个比您的函数更快的替代方法：

import operator

def get_highest(A,B,C,D):
    return max(zip((A, B, C, D), 'ABCD'), key=operator.itemgetter(0))

但是，如果您的目标（如您所见）是将最大值变量归零，则让该函数执行更多操作可能会更好：

def max_becomes_zero(A, B, C, D):
    temp = [A, B, C, D]
    maxind, maxval = max(enumerate(temp), key=operator.itemgetter(1))
    maxname = 'ABCD'[maxind]
    temp[maxind] = 0
    return temp, maxval, maxname

调用如下：

(A, B, C, D), highest, label = max_becomes_zero(A, B, C, D)

补充：有些人可能想知道（并在评论中询问）operator.itemgetter 与 lambda 的相对速度。答：不用奇怪，measure。这就是 Python 标准库中的 timeit 模块 for...:[=22=]

$ python -mtimeit -s'a="something"' 'max(enumerate(a), key=lambda x: x[1])'
1000000 loops, best of 3: 1.56 usec per loop
$ python -mtimeit -s'a="something"; import operator' 'max(enumerate(a), operator.itemgetter(1))'
1000000 loops, best of 3: 0.363 usec per loop

如您所见，在这种特殊情况下（在我的 Linux 工作站上，使用 Python 2.7.9），整个操作的加速令人印象深刻 -- 快了 4 倍多，每次重复节省超过一微秒。

更一般地说，尽可能避免lambda会让你更快乐。

注意：对实际操作进行计时很重要——仅将a和import的初始化等初步操作放在启动中，即在 -s 标志中（推荐）从命令行以 python -mtimeit 形式使用 timeit；我怀疑这个错误显然是阻止评论者重现这些结果的原因（当然，评论者只是猜测 而不是 向我们展示了正在计时的确切代码）。

Answer 2

可能值得尝试 sort 变量：

ordered = sorted(list(zip("ABCD", (A, B, C, D))), key=lambda x: x[1])

>>> print(ordered)
[('C', 5), ('B', 10), ('D', 10), ('A', 15)]

Answer 3

以下在我的机器上完成整个排名需要不到 3µs：

In [43]: [name for (val, name) in sorted(zip((A, B, C, D), "ABCD"))][::-1]
Out[43]: ['A', 'D', 'B', 'C']

In [44]: %timeit [name for (val, name) in sorted(zip((A, B, C, D), "ABCD"))][::-1]
100000 loops, best of 3: 2.71 us per loop

或者这个怎么样（我希望我的比较是正确的:-)）：

def rank1(A, B, C, D):
  lA, lB, lC, lD = "A", "B", "C", "D"
  if A < B:
    A, B, lA, lB = B, A, lB, lA
  if C < D:
    C, D, lC, lD = D, C, lD, lC
  if A < C:
    A, C, lA, lC = C, A, lC, lA
  if B < D:
    B, D, lB, lD = D, B, lD, lB
  if B < C:
    B, C, lB, lC = C, B, lC, lB
  return (A, B, C, D), (lA, lB, lC, lD)

整个排名770ns：

In [6]: %timeit rank1(A, B, C, D)
1000000 loops, best of 3: 765 ns per loop

在 python 中快速对变量进行排名

Quickly rank variables in python

python

ranking

cython