hook into the builtin python f-string format machinery


我真的很喜欢 f-strings。它们是非常棒的语法。

一段时间以来,我已经 an idea for a little library- 如下所述*- 进一步利用它们。我希望它做什么的简单示例:

>>> import simpleformatter as sf
>>> def format_camel_case(string):
...     """camel cases a sentence"""
...     return ''.join(s.capitalize() for s in string.split())
>>> @sf.formattable(camcase=format_camel_case)
... class MyStr(str): ...
>>> f'{MyStr("lime cordial delicious"):camcase}'

为了简化 API 的目的,并将使用扩展到内置 class 实例,找到一种挂接到内置 [=] 实例的方法将非常有用68=] 格式化机制,这将允许内置的自定义格式规范:

>>> f'{"lime cordial delicious":camcase}'

换句话说,我想覆盖内置的 format 函数(由 f 字符串语法使用)——或者,扩展内置的 __format__现有标准库的方法 classes-- 这样我就可以写这样的东西:

for x, y, z in complicated_generator:
    eat_string(f"x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}")

我已经通过使用它们自己的 __format__ 方法创建子 classes 来完成此操作,但是当然这不适用于内置 classes.

我可以使用 string.Formatter api:

my_formatter=MyFormatter()  # custom string.Formatter instance

format_str = "x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}"

for x, y, z in complicated_generator:
    eat_string(my_formatter.format(format_str, **locals()))

我觉得这有点笨拙,与 f 字符串相比绝对不可读 api。

可以做的另一件事是覆盖 builtins.format:

>>> import builtins
>>> builtins.format = lambda *args, **kwargs: 'womp womp'
>>> format(1,"foo")
'womp womp'

...但这不适用于 f 弦:

>>> f"{1:foo}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier


目前 my API 看起来像这样(有些简化):

import simpleformatter as sf
def this_formatting_function(some_obj):
    return "this formatted someobj!"

def that_formatting_function(some_obj):
    return "that formatted someobj!"

class SomeClass: ...


some_obj = SomeClass()

我希望 api 更像下面这样:

def this_formatting_function(some_obj):
    return "this formatted someobj!"

def that_formatting_function(some_obj):
    return "that formatted someobj!"

class SomeClass: ...  # no class decorator needed

...并允许在内置 classes:

x=1  # built-in type instance

但是为了做这些事情,我们必须钻研内置的 format() 函数。我怎样才能融入那多汁的 F 弦的优点?

* 注意:我可能永远不会真正抽出时间来实现这个库!但我确实认为这是一个好主意,并邀请任何想要的人从我这里偷走它:)。



我不会将它集成到您​​的库中,但我会向您展示如何连接到 f 弦的行为中。它的大致工作方式如下:

  1. 编写一个函数来操纵代码对象的字节码指令,以调用挂钩函数来替换 FORMAT_VALUE 指令;
  2. 自定义导入机制以确保每个模块和包(标准库模块和站点包除外)的字节码都使用该函数进行修改。

您可以在 https://github.com/mivdnber/formathack 获得完整的源代码,但下面会解释所有内容。



  1. 完全不能保证这不会破坏完全不相关的代码;
  2. 不能保证此处描述的字节码操作将继续在较新的 Python 版本中工作。它绝对不会在不编译为 CPython 兼容字节码的替代 Python 实现中工作。 PyPy 理论上可以工作,但这里描述的解决方案不是因为 bytecode package 不是 100% 兼容。


第 1 部分:字节码操作

Python 代码不直接执行,而是首先编译成一种更简单的中介,非人类可读的基于堆栈的语言,称为 Python 字节码(它是 *.pyc 文件中的内容)。要了解该字节码是什么样的,您可以使用 standard library dis module 检查一个简单函数的字节码:

def invalid_format(x):
    return f"{x:foo}"


>>> invalid_format("bar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in invalid_format
ValueError: Invalid format specifier

要检查字节码,启动 Python 控制台并调用 dis.dis:

>>> import dis
>>> dis.dis(invalid_format)
  2           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 ('foo')
              4 FORMAT_VALUE             4 (with format)
              6 RETURN_VALUE


# line 2      # Put the value of function parameter x on the stack
  2           0 LOAD_FAST                0 (x)
              # Put the format spec on the stack as a string
              2 LOAD_CONST               1 ('foo')
              # Pop both values from the stack and perform the actual formatting
              # This puts the formatted string on the stack
              4 FORMAT_VALUE             4 (with format)
              # pop the result from the stack and return it
              6 RETURN_VALUE

这里的想法是将 FORMAT_VALUE 指令替换为调用挂钩函数,该函数允许我们实现我们想要的任何行为。让我们现在这样实现它:

def formathack_hook__(value, format_spec=None):
    Gets called whenever a value is formatted. Right now it's a silly implementation,
    but it can be expanded with all sorts of nasty hacks.
    return f"{value} formatted with {format_spec}"

为了替换指令,我使用了 bytecode package,它为做可怕的事情提供了非常好的抽象。

from bytecode import Bytecode
def formathack_rewrite_bytecode__(code):
    Modifies a code object to override the behavior of the FORMAT_VALUE
    instructions used by f-strings.
    decompiled = Bytecode.from_code(code)
    modified_instructions = []
    for instruction in decompiled:
        name = getattr(instruction, 'name', None)
        if name == 'FORMAT_VALUE':
            # 0x04 means that a format spec is present
            if instruction.arg & 0x04 == 0x04:
                callback_arg_count = 2
                callback_arg_count = 1
                # Load in the callback
                Instr("LOAD_GLOBAL", "formathack_hook__"),
                # Shuffle around the top of the stack to put the arguments on top
                # of the function global
                Instr("ROT_THREE" if callback_arg_count == 2 else "ROT_TWO"),
                # Call the callback function instead of executing FORMAT_VALUE
                Instr("CALL_FUNCTION", callback_arg_count)
        # Kind of nasty: we want to recursively alter the code of functions.
        elif name == 'LOAD_CONST' and isinstance(instruction.arg, types.CodeType):
                Instr("LOAD_CONST", formathack_rewrite_bytecode__(instruction.arg), lineno=instruction.lineno)
    modified_bytecode = Bytecode(modified_instructions)
    # For functions, copy over argument definitions
    modified_bytecode.argnames = decompiled.argnames
    modified_bytecode.argcount = decompiled.argcount
    modified_bytecode.name = decompiled.name
    return modified_bytecode.to_code()

我们现在可以使我们之前定义的 invalid_format 函数工作:

>>> invalid_format.__code__ = formathack_rewrite_bytecode__(invalid_format.__code__)
>>> invalid_format("bar")
'bar formatted with foo'


第 2 部分:挂接到导入过程

为了使新的 f-string 行为无处不在,而不仅仅是在手动修补的函数中,我们可以使用标准提供的功能,使用自定义模块查找器和加载器自定义 Python 模块导入过程库 importlib 模块:

class _FormatHackLoader(importlib.machinery.SourceFileLoader):
    A module loader that modifies the code of the modules it imports to override
    the behavior of f-strings. Nasty stuff.
    def find_spec(cls, name, path, target=None):
        # Start out with a spec from a default finder
        spec = importlib.machinery.PathFinder.find_spec(
             # Only apply to modules and packages in the current directory
             # This prevents standard library modules or site-packages
             # from being patched.
        if spec is None:
            return None
        # Modify the loader in the spec to this loader
        spec.loader = cls(name, spec.origin)
        return spec

    def get_code(self, fullname):
        # This is called by exec_module to get the code of the module
        # to execute it.
        code = super().get_code(fullname)
        # Rewrite the code to modify the f-string formatting opcodes
        rewritten_code = formathack_rewrite_bytecode__(code)
        return rewritten_code

    def exec_module(self, module):
        # We introduce the callback that hooks into the f-string formatting
        # process in every imported module
        module.__dict__["formathack_hook__"] = formathack_hook__
        return super().exec_module(module)

为了确保 Python 解释器使用这个加载器来导入所有文件,我们必须将它添加到 sys.meta_path:

def install():
    # If the _FormatHackLoader is not registered as a finder,
    # do it now!
    if sys.meta_path[0] is not _FormatHackLoader:
        sys.meta_path.insert(0, _FormatHackLoader)
        # Tricky part: we want to be able to use our custom f-string behavior
        # in the main module where install was called. That module was loaded
        # with a standard loader though, so that's impossible without additional
        # dirty hacks.
        # Here, we execute the module _again_, this time with _FormatHackLoader
        module_globals = inspect.currentframe().f_back.f_globals
        module_name = module_globals["__name__"]
        module_file = module_globals["__file__"]
        loader = _FormatHackLoader(module_name, module_file)
        # This is actually pretty important. If we don't exit here, the main module
        # will continue from the formathack.install method, causing it to run twice!

如果我们将它们全部放在一个 formathack 模块中(请参阅 https://github.com/mivdnber/formathack 了解集成的工作示例),我们现在可以像这样使用它:

# In your main Python module, install formathack ASAP
import formathack

# From now on, f-string behavior will be overridden!

# -> "foo formatted with bar"
