为什么字典在某些情况下计数比 collections.Counter 快?

Why does a dictionary count in some cases faster than collections.Counter?

我需要一个从非唯一列表中提取唯一元素并计算其重复元素的解决方案。

该解决方案的目的是将其用于从非唯一列表创建唯一组合的算法中。在这种情况下创建组合的列表大小通常非常小(少于 50 个元素),但我的目标是找到尽可能优化的整体最快代码(即使只获得非常少量的 运行 次)。

Python collections 模块提供了一个完全适合这种目的的专用 collections.Counter,但显然在某些情况下,使用简单字典而不是 collections.Counter 会导致更快的代码,例如您可以使用以下代码查看自己:

from time   import time   as t
from timeit import timeit as tt

from collections import Counter
def counter(iterable):
    dctCounter = {}
    for item in iterable:
        if item in dctCounter: 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
    return dctCounter

for n, N in [(1,10), (10,1), (1,50), (50,1), (1,100), (100,1), (1,200), (200, 1), (1, 500), (500, 1), (1, 1000), (1000,1)]:
    lstItems = n*list(range(N))
    for noLoops in [10**p for p in range(5, 6)]: 
        s = t()
        for _ in range(noLoops): 
            dctCounter = counter(lstItems)
        e = t()
        timeDctFctn = e - s
        s = t()
        for _ in range(noLoops): 
            objCounter = Counter(lstItems)
        e = t()
        timeCollCtr = e - s
        timeitCollCtr = tt("objCounter=Counter(lstItems)", "from __main__ import Counter, lstItems", number=noLoops)
        timeitDctFctn = tt("dctCounter=counter(lstItems)", "from __main__ import counter, lstItems", number=noLoops)
        # print("Loops: {:7}, time/timeit CollCtr: {:7.5f}/{:7.5f} DctFctn: {:7.5f}/{:7.5f} sec. lstSize: {:3}, %uniq: {:3.0f}".format(noLoops, timeCollCtr, timeitCollCtr, timeDctFctn, timeitDctFctn, n*N, 100.0/n))
        print("collections.Counter(): {:7.5f},  def counter(): {:7.5f} sec. lstSize: {:3}, %uniq: {:3.0f}, ({} timitLoops)".format(timeitCollCtr, timeitDctFctn, n*N, 100.0/n, noLoops))    
        # print('-----------------------------------------------------------------------------------------------------------')

此处输出:

python3.6 -u "collections.Counter-vs-dictionaryAsCounter_Cg.py"
collections.Counter(): 0.36461, def counter(): 0.09592 sec. lstSize:  10, %uniq: 100, (100000 timitLoops)
collections.Counter(): 0.36444, def counter(): 0.12286 sec. lstSize:  10, %uniq:  10, (100000 timitLoops)
collections.Counter(): 0.58627, def counter(): 0.43233 sec. lstSize:  50, %uniq: 100, (100000 timitLoops)
collections.Counter(): 0.52399, def counter(): 0.54106 sec. lstSize:  50, %uniq:   2, (100000 timitLoops)
collections.Counter(): 0.82332, def counter(): 0.81436 sec. lstSize: 100, %uniq: 100, (100000 timitLoops)
collections.Counter(): 0.72513, def counter(): 1.06823 sec. lstSize: 100, %uniq:   1, (100000 timitLoops)
collections.Counter(): 1.27130, def counter(): 1.59476 sec. lstSize: 200, %uniq: 100, (100000 timitLoops)
collections.Counter(): 1.13817, def counter(): 2.14566 sec. lstSize: 200, %uniq:   0, (100000 timitLoops)
collections.Counter(): 3.16287, def counter(): 4.26738 sec. lstSize: 500, %uniq: 100, (100000 timitLoops)
collections.Counter(): 2.64247, def counter(): 5.67448 sec. lstSize: 500, %uniq:   0, (100000 timitLoops)
collections.Counter(): 4.89153, def counter(): 7.68661 sec. lstSize:1000, %uniq: 100, (100000 timitLoops)
collections.Counter(): 6.06389, def counter():13.92613 sec. lstSize:1000, %uniq:   0, (100000 timitLoops)
>Exit code: 0

P.S。似乎 collections.Counter() 在上述另一种情况下也未能达到预期。看这里:

Counter 在计算 短迭代 时有一个主要瓶颈:它检查 if isinstance(iterable, Mapping)。这个测试相当慢,因为 collections.abc.Mapping 是一个 abstract metaclass and as such the isinstance-check is a bit more complicated than normal isinstance-checks, see for example:

因此,如果其他方法对于短迭代来说更快,这并不奇怪。但是对于 long iterables 检查并不重要并且 Counter 应该更快(至少对于 python-3.x (CPython)实际计数函数_count_elements写在).

识别瓶颈的一种简单方法是分析。我将在这里使用 line_profiler

%load_ext line_profiler

from collections import Counter
x = range(50)
# Profile the function Counter.update when executing the command "Counter(x)"
%lprun -f Counter.update Counter(x)

结果:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   604         1            8      8.0      3.9          if not args:
   605                                                       raise TypeError("descriptor 'update' of 'Counter' object "
   606                                                                       "needs an argument")
   607         1           13     13.0      6.4          self, *args = args
   608         1            6      6.0      3.0          if len(args) > 1:
   609                                                       raise TypeError('expected at most 1 arguments, got %d' % len(args))
   610         1            5      5.0      2.5          iterable = args[0] if args else None
   611         1            3      3.0      1.5          if iterable is not None:
   612         1           94     94.0     46.3              if isinstance(iterable, Mapping):
   613                                                           if self:
   614                                                               self_get = self.get
   615                                                               for elem, count in iterable.items():
   616                                                                   self[elem] = count + self_get(elem, 0)
   617                                                           else:
   618                                                               super(Counter, self).update(iterable) # fast path when counter is empty
   619                                                       else:
   620         1           69     69.0     34.0                  _count_elements(self, iterable)
   621         1            5      5.0      2.5          if kwds:
   622                                                       self.update(kwds)

因此从可迭代对象初始化 Counter 所花费的时间有一个相当大的常数因子(46% 的时间花在了 isinstance 检查上,获取带有计数的字典仅占 34%)。

然而,对于长迭代来说,这无关紧要(很多),因为它只完成了一次:

%lprun -f Counter.update Counter([1]*100000)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   604         1           12     12.0      0.0          if not args:
   605                                                       raise TypeError("descriptor 'update' of 'Counter' object "
   606                                                                       "needs an argument")
   607         1           12     12.0      0.0          self, *args = args
   608         1            6      6.0      0.0          if len(args) > 1:
   609                                                       raise TypeError('expected at most 1 arguments, got %d' % len(args))
   610         1            6      6.0      0.0          iterable = args[0] if args else None
   611         1            3      3.0      0.0          if iterable is not None:
   612         1           97     97.0      0.3              if isinstance(iterable, Mapping):
   613                                                           if self:
   614                                                               self_get = self.get
   615                                                               for elem, count in iterable.items():
   616                                                                   self[elem] = count + self_get(elem, 0)
   617                                                           else:
   618                                                               super(Counter, self).update(iterable) # fast path when counter is empty
   619                                                       else:
   620         1        28114  28114.0     99.5                  _count_elements(self, iterable)
   621         1           13     13.0      0.0          if kwds:
   622                                                       self.update(kwds)

只是为了让您了解它们如何根据元素的数量执行,为了比较,我包括了您的计数的优化版本和 Counter 使用的 _count_elements 函数。然而,我排除了 sorted 项目的部分并创建了一个计数列表以避免其他影响 - 特别是因为 sorted 具有不同的 运行 时间行为(O(n log(n)) ) 比计数 (O(n)):

# Setup

import random
from collections import Counter, _count_elements

def count(iterable):
    """Explicit iteration over items."""
    dctCounter = {}
    for item in iterable:
        if item in dctCounter: 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
    return dctCounter

def count2(iterable):
    """Iterating over the indices"""
    dctCounter = {}
    lenLstItems = len(iterable)
    for idx in range(lenLstItems):
        item = iterable[idx]
        if item in dctCounter.keys(): 
            dctCounter[item] += 1
        else: 
            dctCounter[item]  = 1
    return dctCounter

def c_count(iterable):
    """Internal counting function that's used by Counter"""
    d = {}
    _count_elements(d, iterable)
    return d

# Timing

timings = {Counter: [], count: [], count2: [], c_count: []}

for i in range(1, 20):
    print(2**i)
    it = [random.randint(0, 2**i) for _ in range(2**i)]
    for func in (Counter, count, count2, c_count):
        res = %timeit -o func(it)
        timings[func].append(res)

# Plotting

%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

n = 2**np.arange(1, 5)

ax.plot(n, 
        [time.average for time in timings[count]], 
        label='my custom function', c='red')
ax.plot(n, 
        [time.average for time in timings[count2]], 
        label='your custom function', c='green')
ax.plot(n, 
        [time.average for time in timings[Counter]], 
        label='Counter', c='blue')
ax.plot(n, 
        [time.average for time in timings[c_count]], 
        label='_count_elements', c='purple')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('elements')
ax.set_ylabel('time to count them [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()

结果:

个人计时:

2
30.5 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.67 µs ± 3.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
6.03 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.67 µs ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
4
30.7 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2.63 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.81 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.97 µs ± 5.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
8
34.3 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
4.3 µs ± 16.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11.3 µs ± 23.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
16
34.2 µs ± 599 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.46 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
17.5 µs ± 83.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.24 µs ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
32
38.4 µs ± 578 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
13.7 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.8 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.56 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
64
43.5 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
24 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
52.8 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
11.6 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
128
53.5 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
47.8 µs ± 507 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
101 µs ± 3.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.7 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
256
69.6 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
92.1 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
188 µs ± 3.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
39.5 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
512
123 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
200 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
409 µs ± 3.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
90.9 µs ± 958 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1024
230 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
428 µs ± 5.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
855 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
193 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2048
436 µs ± 7.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
868 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.76 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
386 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4096
830 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.8 ms ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.75 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.06 ms ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8192
2.3 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.8 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.8 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.69 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
16384
4.53 ms ± 349 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.22 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.9 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.9 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32768
9.6 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.2 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32.5 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.4 ms ± 687 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
65536
24.8 ms ± 490 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
40.1 ms ± 710 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
66.8 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24.5 ms ± 659 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
131072
54.6 ms ± 756 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
84.2 ms ± 728 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
142 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.1 ms ± 424 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
262144
120 ms ± 4.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
296 ms ± 3.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
117 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
524288
244 ms ± 3.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
368 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
601 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
252 ms ± 6.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)