CPython:为什么 3 行脚本需要在解释器中执行远远超过 3 个周期才能执行?

CPython: Why does a 3-line script require far more than 3 cycles in the interpreter to execute?

刚看了this Youtube lecturePhilip Guo的CPython Internals,有一点很疑惑

在25:55处,他修改了CPython的C源代码,在运行所有字节码指令的无限循环的开头插入printf(“hello\n”);您可以通过以下方式执行相同操作:

他写了3行test.py:

X = 1
Y = 2
print X + Y

不解的是,当他用修改后的解释器运行test.py时,为什么在我们看到“3”之前有这么多的“hello”?

那 3 行代码应该只编译成几个字节码指令,加载值 1,加载值 2 和调用打印的指令,所以我可以想象当执行从 [=44 编译的字节码时=], 我们应该只看到几个 "hello".

所以编译器在编译外部Python脚本之前实际上生成了许多内部字节码指令?

您看到这么多 hello 打印的原因有两个:

  • Python 没有针对每个可能的 Python 语句的特殊字节码。相反,语句将使用字节码的组合
  • Python 解释器导入一系列 Python 模块 只是为了启动 运行ning。您可以 运行 带有 -v 开关的常规 Python 解释器来查看每次导入的内容。每个模块都由多个语句组成,因此在您开始使用 运行ning.
  • 的小脚本之前,需要经过相当多的字节码。

如果我将这 3 行放入 test.py 并将我未修改的 Python 2.7 二进制文件用于 运行 那,通过 -v 开关,我看到:

$ python2.7 -v test.py
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /..../lib/python2.7/site.pyc matches /..../lib/python2.7/site.py
import site # precompiled from /..../lib/python2.7/site.pyc
# /..../lib/python2.7/os.pyc matches /..../lib/python2.7/os.py
import os # precompiled from /..../lib/python2.7/os.pyc
import errno # builtin
import posix # builtin
# /..../lib/python2.7/posixpath.pyc matches /..../lib/python2.7/posixpath.py
import posixpath # precompiled from /..../lib/python2.7/posixpath.pyc
# /..../lib/python2.7/stat.pyc matches /..../lib/python2.7/stat.py
import stat # precompiled from /..../lib/python2.7/stat.pyc
# /..../lib/python2.7/genericpath.pyc matches /..../lib/python2.7/genericpath.py
import genericpath # precompiled from /..../lib/python2.7/genericpath.pyc
# /..../lib/python2.7/warnings.pyc matches /..../lib/python2.7/warnings.py
import warnings # precompiled from /..../lib/python2.7/warnings.pyc
# /..../lib/python2.7/linecache.pyc matches /..../lib/python2.7/linecache.py
import linecache # precompiled from /..../lib/python2.7/linecache.pyc
# /..../lib/python2.7/types.pyc matches /..../lib/python2.7/types.py
import types # precompiled from /..../lib/python2.7/types.pyc
# /..../lib/python2.7/UserDict.pyc matches /..../lib/python2.7/UserDict.py
import UserDict # precompiled from /..../lib/python2.7/UserDict.pyc
# /..../lib/python2.7/_abcoll.pyc matches /..../lib/python2.7/_abcoll.py
import _abcoll # precompiled from /..../lib/python2.7/_abcoll.pyc
# /..../lib/python2.7/abc.pyc matches /..../lib/python2.7/abc.py
import abc # precompiled from /..../lib/python2.7/abc.pyc
# /..../lib/python2.7/_weakrefset.pyc matches /..../lib/python2.7/_weakrefset.py
import _weakrefset # precompiled from /..../lib/python2.7/_weakrefset.pyc
import _weakref # builtin
# /..../lib/python2.7/copy_reg.pyc matches /..../lib/python2.7/copy_reg.py
import copy_reg # precompiled from /..../lib/python2.7/copy_reg.pyc
import encodings # directory /..../lib/python2.7/encodings
# /..../lib/python2.7/encodings/__init__.pyc matches /..../lib/python2.7/encodings/__init__.py
import encodings # precompiled from /..../lib/python2.7/encodings/__init__.pyc
# /..../lib/python2.7/codecs.pyc matches /..../lib/python2.7/codecs.py
import codecs # precompiled from /..../lib/python2.7/codecs.pyc
import _codecs # builtin
# /..../lib/python2.7/encodings/aliases.pyc matches /..../lib/python2.7/encodings/aliases.py
import encodings.aliases # precompiled from /..../lib/python2.7/encodings/aliases.pyc
# /..../lib/python2.7/encodings/utf_8.pyc matches /..../lib/python2.7/encodings/utf_8.py
import encodings.utf_8 # precompiled from /..../lib/python2.7/encodings/utf_8.pyc
Python 2.7.15 (default, May  7 2018, 17:08:03)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
3
# -- clean-up output omitted --

其中的每个 import ... 行引用一个内置模块(Python 二进制文件的一部分,用 C 实现)或 .pyc 字节码缓存文件。在脚本代码 运行.

之前,有 17 个这样的文件被导入

主脚本中的 3 行代码转换为另外 9 条字节码指令:

>>> import dis
>>> dis.dis(compile(r'''\
... X = 1
... Y = 2
... print X + Y
... ''', '', 'exec'))
  2           0 LOAD_CONST               0 (1)
              3 STORE_NAME               0 (X)

  3           6 LOAD_CONST               1 (2)
              9 STORE_NAME               1 (Y)

  4          12 LOAD_NAME                0 (X)
             15 LOAD_NAME                1 (Y)
             18 BINARY_ADD
             19 PRINT_ITEM
             20 PRINT_NEWLINE
             21 LOAD_CONST               2 (None)
             24 RETURN_VALUE

(我忽略了最后的 2 个字节码,编码了一个额外的 return None,这并不真正适用于模块)。