Speed/efficiency 循环与列表理解与其他方法的比较

Speed/efficiency comparison for loop vs list comprehension vs other methods

使用我在下面给出的示例代码,我想更好地理解 python 速度如何根据我构建给定函数的方式而变化。

我定义的示例函数工作如下:给定两个字符串,它们 return 它们不同的位数。我们假设 assert len(s1) == len(s2) 始终为真。

第一个函数使用列表理解。

def h_dist1(s1,s2):
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

第二个函数使用经典的 for 循环。

def h_dist2(s1,s2):
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

第二个代码的复杂度显然是O(N) where len(s1)=len(s2)=N.

示例相关问题:有没有更好的方法来定义这个特定的函数? h_dist1 的复杂度是多少?

一般问题:一般来说,什么是最好的(在可读性、速度、效率等方面)定义函数的方法 pythonic类似于上面示例中给出的那些(即需要循环 string/array/etc)?而且,最重要的是,为什么一个特定的方式最fast/efficient

注意我找过类似的问题,但没有找到任何具体的问题,例如在 here HYRY 中说,要加速代码,应该使用 1. for 循环中的局部变量和 2. 使用列表理解。 但我还是不明白为什么。当然,欢迎任何对其他 Q/A 的引用。

尽量去掉Python循环,不要在内存中创建不必要的列表,遵循这些你可以获得一个非常有效的解决方案。例如 zip 在内存中创建一个列表,所以我们可以使用 itertools.izip 来获取迭代器。因此,根据我的快速测试,sum(starmap(ne, izip(s1, s2))) 是最快的:

>>> from itertools import imap, izip, starmap
>>> from operator import ne
>>> s1 = 'a'*10**5
>>> s2 = 'b'*10**5
>>> %timeit sum(starmap(ne, izip(s1, s2)))
100 loops, best of 3: 4.25 ms per loop

其他一些解决方案:

>>> %timeit sum(imap(ne, s1, s2))
100 loops, best of 3: 5.08 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
100 loops, best of 3: 11.3 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 10.7 ms per loop
>>> %timeit sum(dgt1 != dgt2 for dgt1, dgt2 in izip(s1, s2))
100 loops, best of 3: 7.02 ms per loop
>>> %timeit sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
100 loops, best of 3: 6.17 ms per loop

但差异并不大,所以我个人会使用 izip 和生成器表达式,而不会滥用 Python 中的 True == 1 和 False == 0 的事实:

sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

不要太快取消这个不起眼的 for 循环。如果您实际上不需要列表,就像在这种情况下,标准的 for 循环可能比使用列表理解更快。当然,它的内存开销也更少。

这是一个执行计时测试的程序;它可以很容易地修改以添加更多测试。

#!/usr/bin/env python

''' Time various implementations of string diff function

    From 

    Written by PM 2Ring 2015.02.18
'''

from itertools import imap, izip, starmap
from operator import ne

from timeit import Timer
from random import random, seed

def h_dist0(s1,s2):
    ''' For loop '''
    tot = 0
    for d1, d2 in zip(s1, s2):
        if d1 != d2:
            tot += 1
    return tot

def h_dist1(s1,s2):
    ''' List comprehension '''
    return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])

def h_dist2(s1,s2):
    ''' Generator expression '''
    return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

def h_dist3(s1,s2):
    ''' Generator expression with if '''
    return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)

def h_dist3a(s1,s2):
    ''' Generator expression with izip '''
    return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

def h_dist4(s1,s2):
    ''' imap '''
    return sum(imap(ne, s1, s2))

def h_dist5(s1,s2):
    ''' starmap '''
    return sum(starmap(ne, izip(s1, s2)))

funcs = [
    h_dist0,
    h_dist1,
    h_dist2,
    h_dist3,
    h_dist3a,
    h_dist4,
    h_dist5,
]

# ------------------------------------

def check_full():
    print 'Testing all functions with strings of length', len(s1)
    for func in funcs:
        print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))

def check():
    print 'Testing all functions with strings of length', len(s1)
    print [func(s1, s2) for func in funcs], '\n'

def time_test(loops=10000, reps=3):
    ''' Print timing stats for all the functions '''
    slen = len(s1)
    print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)

    for func in funcs:
        #Get function name and docstring
        fname = func.func_name
        fdoc = func.__doc__

        print '\n%s:%s' % (fname, fdoc)
        t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
        results = t.repeat(reps, loops)
        results.sort()
        print results
    print '\n' + '- '*30 + '\n'

def make_strings(n, r=0.5):
    print 'r:', r
    s1 = 'a' * n
    s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
    return s1, s2

# ------------------------------------

seed(37)

s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)

s1, s2 = make_strings(10)
check()
time_test(50000)

s1, s2 = make_strings(1000)
check()
time_test(1000)

以下结果来自 Linux.

上的 32 位 2GHz Pentium 4 运行 Python 2.6.6

输出

r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]

h_dist1: List comprehension 
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]

h_dist2: Generator expression 
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]

h_dist3: Generator expression with if 
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]

h_dist3a: Generator expression with izip 
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]

h_dist4: imap 
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]

h_dist5: starmap 
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]

h_dist1: List comprehension 
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]

h_dist2: Generator expression 
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]

h_dist3: Generator expression with if 
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]

h_dist3a: Generator expression with izip 
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]

h_dist4: imap 
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]

h_dist5: starmap 
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94] 

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop 
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]

h_dist1: List comprehension 
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]

h_dist2: Generator expression 
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]

h_dist3: Generator expression with if 
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]

h_dist3a: Generator expression with izip 
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]

h_dist4: imap 
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]

h_dist5: starmap 
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5] 

Length = 10, Loops = 50000, Repetitions = 3

h_dist0: For loop 
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]

h_dist1: List comprehension 
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]

h_dist2: Generator expression 
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]

h_dist3: Generator expression with if 
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]

h_dist3a: Generator expression with izip 
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]

h_dist4: imap 
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]

h_dist5: starmap 
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506] 

Length = 1000, Loops = 1000, Repetitions = 3

h_dist0: For loop 
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]

h_dist1: List comprehension 
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]

h_dist2: Generator expression 
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]

h_dist3: Generator expression with if 
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]

h_dist3a: Generator expression with izip 
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]

h_dist4: imap 
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]

h_dist5: starmap 
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -