Python 3 中的 `list(generator expression)` 的列表理解是语法糖吗?
Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?
在 Python 3 中,列表推导式是否只是输入 list
函数的生成器表达式的语法糖?
例如是下面的代码:
squares = [x**2 for x in range(1000)]
居然在后台转换成了下面的样子?
squares = list(x**2 for x in range(1000))
我知道输出是相同的,并且 Python 3 修复了列表推导所具有的对周围名称空间的令人惊讶的副作用,但就 CPython 解释器在幕后所做的而言,前者是否转换为后者,或者代码的执行方式有何不同?
背景
我在评论部分发现了与 this question, and a quick google search showed the same claim being made here 等价的说法。
What's New in Python 3.0 docs中也提到过这个,但措辞有些含糊:
Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.
两种形式都创建和调用匿名函数。然而,list(...)
形式创建一个生成器函数并将返回的生成器迭代器传递给 list
,而对于 [...]
形式,匿名函数直接使用 LIST_APPEND
构建列表操作码。
以下代码获取示例理解的匿名函数的反编译输出及其相应的 genexp-passed-to-list
:
import dis
def f():
[x for x in []]
def g():
list(x for x in [])
dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])
理解的输出是
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
genexp 的输出是
7 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
两者的工作方式不同。列表理解版本直接为我们利用了特殊的字节码 LIST_APPEND
which calls PyList_Append
。因此,它避免了对 list.append
的属性查找和 Python 级别的函数调用。
>>> def func_lc():
[x**2 for x in y]
...
>>> dis.dis(func_lc)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
另一方面,list()
版本只是将生成器对象传递给列表的 __init__
method which then calls its extend
method internally. As the object is not a list or tuple, CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:
>>> def func_ge():
list(x**2 for x in y)
...
>>> dis.dis(func_ge)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (y)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
2 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_CONST 0 (2)
15 BINARY_POWER
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>>
时序比较:
>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out = []
for x in range(10**6):
out.append(x**2)
...
1 loops, best of 3: 510 ms per loop
由于属性查找速度慢,正常循环稍微慢一些。缓存它并再次缓存它。
>>> %%timeit
out = [];append=out.append
for x in range(10**6):
append(x**2)
...
1 loops, best of 3: 467 ms per loop
除了列表理解不再泄漏变量这一事实之外,还有一个区别是这样的东西不再有效:
>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
File "<ipython-input-69-bea9540dd1d6>", line 1
[x**2 for x in 1, 2, 3]
^
SyntaxError: invalid syntax
>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
print(x**2)
...
1
4
9
你实际上可以证明两者可以有不同的结果来证明它们本质上是不同的:
>>> list(next(iter([])) if x > 3 else x for x in range(10))
[0, 1, 2, 3]
>>> [next(iter([])) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
StopIteration
理解中的表达式不被视为生成器,因为理解不处理 StopIteration
,而 list
构造函数处理。
它们不一样,list()
将在括号中的内容完成执行后评估给它的任何内容,而不是之前。
python 中的 []
有点神奇,它告诉 python 将其中的内容包装为列表,更像是语言的类型提示。
在 Python 3 中,列表推导式是否只是输入 list
函数的生成器表达式的语法糖?
例如是下面的代码:
squares = [x**2 for x in range(1000)]
居然在后台转换成了下面的样子?
squares = list(x**2 for x in range(1000))
我知道输出是相同的,并且 Python 3 修复了列表推导所具有的对周围名称空间的令人惊讶的副作用,但就 CPython 解释器在幕后所做的而言,前者是否转换为后者,或者代码的执行方式有何不同?
背景
我在评论部分发现了与 this question, and a quick google search showed the same claim being made here 等价的说法。
What's New in Python 3.0 docs中也提到过这个,但措辞有些含糊:
Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.
两种形式都创建和调用匿名函数。然而,list(...)
形式创建一个生成器函数并将返回的生成器迭代器传递给 list
,而对于 [...]
形式,匿名函数直接使用 LIST_APPEND
构建列表操作码。
以下代码获取示例理解的匿名函数的反编译输出及其相应的 genexp-passed-to-list
:
import dis
def f():
[x for x in []]
def g():
list(x for x in [])
dis.dis(f.__code__.co_consts[1])
dis.dis(g.__code__.co_consts[1])
理解的输出是
4 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
genexp 的输出是
7 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
两者的工作方式不同。列表理解版本直接为我们利用了特殊的字节码 LIST_APPEND
which calls PyList_Append
。因此,它避免了对 list.append
的属性查找和 Python 级别的函数调用。
>>> def func_lc():
[x**2 for x in y]
...
>>> dis.dis(func_lc)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> lc_object = list(dis.get_instructions(func_lc))[0].argval
>>> lc_object
<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
>>> dis.dis(lc_object)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
另一方面,list()
版本只是将生成器对象传递给列表的 __init__
method which then calls its extend
method internally. As the object is not a list or tuple, CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:
>>> def func_ge():
list(x**2 for x in y)
...
>>> dis.dis(func_ge)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (y)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> ge_object = list(dis.get_instructions(func_ge))[1].argval
>>> ge_object
<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
>>> dis.dis(ge_object)
2 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_CONST 0 (2)
15 BINARY_POWER
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>>
时序比较:
>>> %timeit [x**2 for x in range(10**6)]
1 loops, best of 3: 453 ms per loop
>>> %timeit list(x**2 for x in range(10**6))
1 loops, best of 3: 478 ms per loop
>>> %%timeit
out = []
for x in range(10**6):
out.append(x**2)
...
1 loops, best of 3: 510 ms per loop
由于属性查找速度慢,正常循环稍微慢一些。缓存它并再次缓存它。
>>> %%timeit
out = [];append=out.append
for x in range(10**6):
append(x**2)
...
1 loops, best of 3: 467 ms per loop
除了列表理解不再泄漏变量这一事实之外,还有一个区别是这样的东西不再有效:
>>> [x**2 for x in 1, 2, 3] # Python 2
[1, 4, 9]
>>> [x**2 for x in 1, 2, 3] # Python 3
File "<ipython-input-69-bea9540dd1d6>", line 1
[x**2 for x in 1, 2, 3]
^
SyntaxError: invalid syntax
>>> [x**2 for x in (1, 2, 3)] # Add parenthesis
[1, 4, 9]
>>> for x in 1, 2, 3: # Python 3: For normal loops it still works
print(x**2)
...
1
4
9
你实际上可以证明两者可以有不同的结果来证明它们本质上是不同的:
>>> list(next(iter([])) if x > 3 else x for x in range(10))
[0, 1, 2, 3]
>>> [next(iter([])) if x > 3 else x for x in range(10)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
StopIteration
理解中的表达式不被视为生成器,因为理解不处理 StopIteration
,而 list
构造函数处理。
它们不一样,list()
将在括号中的内容完成执行后评估给它的任何内容,而不是之前。
python 中的 []
有点神奇,它告诉 python 将其中的内容包装为列表,更像是语言的类型提示。