List Comprehension 和 Generators 避免在使用条件表达式时计算相同的值两次

List Comprehension and Generators to avoid computing the same value twice when using conditional expressions

假设您有一些昂贵的 cpu 密集型函数,例如解析 xml 字符串。在这种情况下,我们的简单函数将是:

def parse(foo):
    return int(foo)

作为输入,您有一个字符串列表,您想要解析它们并找到满足某些条件的已解析字符串的子集。理想情况下,我们只想对每个字符串执行一次解析。

如果没有列表理解,您可以:

olds = ["1", "2", "3", "4", "5"]
news = []
for old in olds:
    new = parse(old)      # First and only Parse
    if new > 3:
        news.append(new)

要将此作为列表推导来执行,您似乎必须执行两次解析,一次获取新值,一次执行条件检查:

olds = ["1", "2", "3", "4", "5"]
news = [
    parse(new)         # First Parse
    for new in olds
    if parse(new) > 3  # Second Parse
]

例如,此语法将不起作用:

olds = ["1", "2", "3", "4", "5"]
# Raises SyntaxError: can't assign to function call
news = [i for parse(i) in olds if i > 5]

使用生成器似乎可行:

def parse(strings):
    for string in strings:
        yield int(string)

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds) if i > 3]

但是你可以在生成器中抛出条件:

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

olds = ["1", "2", "3", "4", "5"]
news = [i for i in parse(olds)]

我想知道的是,就优化(不是可重用性等)而言,哪个更好,解析发生在生成器中但条件检查发生在列表推导中,还是一个在生成器中同时进行解析和条件检查的地方?有没有比这两种方法更好的替代方法?


这是 Python 3.6.5 中 dis.dis 的一些输出。请注意,在我的 Python 版本中,为了反汇编列表理解,我们必须使用 f.__code__.co_consts[1]。检查此 以获得解释。

Generator 进行解析,List Comprehension 进行条件检查

def parse(strings):
    for string in strings:
        yield int(string)

def main(strings):
    return [i for i in parse(strings) if i > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (i)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 SETUP_LOOP              22 (to 24)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                14 (to 22)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE            6
        >>   22 POP_BLOCK
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE
"""

生成器进行解析和条件检查

def parse(strings):
    for string in strings:
        val = int(string)
        if val > 3:
            yield val

def main(strings):
    return [i for i in parse(strings)]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
"""
dis.dis(parse)
"""
  2           0 SETUP_LOOP              34 (to 36)
              2 LOAD_FAST                0 (strings)
              4 GET_ITER
        >>    6 FOR_ITER                26 (to 34)
              8 STORE_FAST               1 (string)

  3          10 LOAD_GLOBAL              0 (int)
             12 LOAD_FAST                1 (string)
             14 CALL_FUNCTION            1
             16 STORE_FAST               2 (val)

  4          18 LOAD_FAST                2 (val)
             20 LOAD_CONST               1 (3)
             22 COMPARE_OP               4 (>)
             24 POP_JUMP_IF_FALSE        6

  5          26 LOAD_FAST                2 (val)
             28 YIELD_VALUE
             30 POP_TOP
             32 JUMP_ABSOLUTE            6
        >>   34 POP_BLOCK
        >>   36 LOAD_CONST               0 (None)
             38 RETURN_VALUE

天真的紧环

def parse(string):
    return int(string)

def main(strings):
    values = []
    for string in strings:
        value = parse(string)
        if value > 3:
            values.append(value)
    return values

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main)
"""
  2           0 BUILD_LIST               0
              2 STORE_FAST               1 (values)

  3           4 SETUP_LOOP              38 (to 44)
              6 LOAD_FAST                0 (strings)
              8 GET_ITER
        >>   10 FOR_ITER                30 (to 42)
             12 STORE_FAST               2 (string)

  4          14 LOAD_GLOBAL              0 (parse)
             16 LOAD_FAST                2 (string)
             18 CALL_FUNCTION            1
             20 STORE_FAST               3 (value)

  5          22 LOAD_FAST                3 (value)
             24 LOAD_CONST               1 (3)
             26 COMPARE_OP               4 (>)
             28 POP_JUMP_IF_FALSE       10

  6          30 LOAD_FAST                1 (values)
             32 LOAD_ATTR                1 (append)
             34 LOAD_FAST                3 (value)
             36 CALL_FUNCTION            1
             38 POP_TOP
             40 JUMP_ABSOLUTE           10
        >>   42 POP_BLOCK

  7     >>   44 LOAD_FAST                1 (values)
             46 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

注意前两个使用列表推导和生成器的反汇编如何指示两个 for 循环,一个在主循环(列表推导)中,一个在解析(生成器)中。这并不像听起来那么糟糕,对吧?例如,整个操作是 O(n) 而不是 O(n^2) ?

编辑:这是 khelwood 的解决方案:

def parse(string):
    return int(string)

def main(strings):
    return [val for val in (parse(string) for string in strings) if val > 3]

assert main(["1", "2", "3", "4", "5"]) == [4, 5]

dis.dis(main.__code__.co_consts[1])
"""
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (val)
              8 LOAD_FAST                1 (val)
             10 LOAD_CONST               0 (3)
             12 COMPARE_OP               4 (>)
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (val)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE
"""

dis.dis(parse)
"""
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_FAST                0 (string)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE
"""

我认为你可以比你想象的更简单:

olds = ["1", "2", "3", "4", "5"]
news = [new for new in (parse(old) for old in olds) if new > 3]

或者只是:

news = [new for new in map(parse, olds) if new > 3]

这两种方式 parse 每个项目只调用一次。