如何在运行时将模块中的函数添加到 class 同时保留包层次结构?

How to add functions from modules to a class at runtime while preserving the package hierarchy?

假设我有一个 Python 3 包结构如下:

.
└── MyFunPackage/
    ├── __init__.py
    ├── helloworld.py
    └── worlds/
        ├── __init__.py
        ├── world1.py
        └── world2.py

helloworld.py 定义如下 class:

class World(object):
    def __init__(self, name):
        self.name = name

worlds子包中的每个模块都定义了不同的功能。例如,world1.py 可能包含:

def frobulate(self):
   return f'{self.name} has been frobulated' 

我的最终目标是在运行时将 worlds 子包中包含的每个模块中的每个函数添加到 World class,这样我就不需要当我向 worlds/ 添加另一个模块时手动更改任何内容(例如 world3.py)。但是,我还想保留包层次结构,以便包外的脚本可以执行以下操作:

from MyFunPackage.helloworld import World
aWorld = World('a')
print(aWorld.world1.frobulate()) # 'a has been frobulated'

之后,如果我在worlds子包中添加一个world3.py,我应该可以在不修改World的情况下将以下内容添加到外部脚本中class:

print(aWorld.world3.wormhole(2)) # 'a has transited wormhole #2 to world3'

我想我已经从这些 Whosebug 问题中找到了一些我需要的东西:

但是,我在将这些部分组合在一起时遇到了很多麻烦,尤其是 "preserving package hierarchy" 位。我想在这里完成的事情有可能吗?如果是,我将如何实施它?

所以,这可能不是 Python 旨在解决的问题,但我们可以让它发挥作用。

这个难题有两个独立的部分:第一,"how do I import all these packages without knowing them in advance?",第二,"how do I bind those packages to a World object in a way that allows me to call method on them with self as the first parameter?"我会按顺序解决这些问题。


如何导入目录中的所有包?

__init__.py 是包含在您尝试加载模块时运行的代码的文件。通常它负责收集模块中的所有重要资源并构建其他人可以使用的本地命名空间。我们将稍微滥用此行为:

worlds/__init__.py

import os, pkgutil

# import the names of all modules in this directory, save it to __all__
# this allows us to later do `from worlds import world1`, etc., if we want
# (though our helloworld doesn't actually do that)
__all__ = list(module for _, module, _ in pkgutil.iter_modules([os.path.dirname(__file__)]))

# make an attribute called `worlds` that is a dict between the name of each
# module in this folder, and the module itself.
worlds = {}
for _world_name in __all__:
    worlds[_world_name] = __import__(_world_name, locals(), globals(), level=1)

# You might want to do this as a dict comprehension, but that doesn't work.
# When I try to do so:
#
#      worlds2 = {_world_name:__import__(_world_name, locals(), globals(), level=1)
#                 for _world_name in __all__}
#
# I get the following error:
#
#   File ".../worlds/__init__.py", line 10, in <module>
#       for _world_name in __all__}
#   File ".../worlds/__init__.py", line 10, in <dictcomp>
#       for _world_name in __all__}
#   KeyError: "'__name__' not in globals"
#
# I have no idea why and a quick Google search turned up nothing.

这有两个作用。首先,它允许我们根据需要有选择地执行通常的 from worlds import world1, world2, ...。这就是分配给 __all__ 的作用。查找所有可导入模块的方法直接取自 this answer.

然而,这使得 __all__ 成为一个字符串列表,这对 helloworld 没有用,实际上不是。相反,我然后创建一个字典 worlds 并在每个世界的名称和该名称所指的模块之间建立直接对应关系(通过 __import__() 动态导入模块)。所以现在我们也可以通过 worlds.worlds['world1'] 得到 world1。这对我们更有用。


如何将这些 packages/functions 绑定到 World

这个问题还有另外两个部分:"how do I bind these packages" 和 "how do I get the function calls to still pass my World instance as a parameter"。第一个答案很简单:只需导入 worlds,然后遍历 worlds.worlds.items() 并使用 setattr() 将键值对分配为属性。

但是如果我们这样做:

for module_name, module in worlds.worlds.items():
    setattr(self, module_name, module)

然后我们得到错误的行为:

>>> x = helloworld.World('hello')
>>> x.world1.frobulate()
TypeError: frobulate() missing 1 required positional argument: 'self'

这个问题的解决方案是放入某种中间包装器,它会在您尝试时添加 World() 的实例作为第一个参数打电话给它。我通过创建一个新的内部 class、SubWorld 来做到这一点,在初始化时有效地 重新绑定 模块中的每个方法。

因此,这个完整的代码:

helloworld.py

import worlds

# here's your generic World object
class World(object):
    def __init__(self, name):
        self.name = name
        # We take the dict that we created in worlds/__init__.py, and
        # iterate through it
        for world_name, module in worlds.worlds.items():
            # for each name/module pair, we assign that name as an attribute
            # to this object, paired to an object that holds all of its methods.
            # We could just pass the module itself as the third argument here,
            # but then `self` doesn't get passed as the first parameter. So,
            # we use an instance of a wrapper class which takes care of that.
            # See below.
            setattr(self, world_name, self.SubWorld(self, module))

    # Instead of importing the module wholesale, we make an inner class
    # and have that subclass essentially delegate functionality, by
    # essentially prepending the `self` parameter to the call.
    class SubWorld:
        def __init__(self, world, module):
            # scan all the attributes of the module
            for name in dir(module):
                obj = getattr(module, name)
                # if the object is a callable function, then add the World instance
                # as a `self`. We do this using a lambda.
                if callable(obj):
                    # We have the lambda take *args and **kwargs - that is,
                    # an arbitrary, catch-all list of args and kwargs to pass on.
                    # Then, we forward the function call with the same args and kwargs,
                    # except that we add `world` as a first argument (to take the place
                    # of `self`.
                    # We then set this lambda as an attribute with the same name as it
                    # had in the module we took the function from.
                    setattr(self, name, lambda *a,**k:obj(world,*a,**k))

这给了我们预期的行为:

>>> import helloworld
>>> x = helloworld.World('Tim')
>>> print(x.world1.frobulate())
'Tim has been frobulated'

根据每个 worldn 对象的工作方式,您可以相应地修改 SubWorld(例如,如果需要维护对变量的引用以及对函数的引用)。动态处理此问题的一个好方法可能是使用 property()s 并将任何特定变量 v 的 getter 指定为像 lambda v:getattr(module, v).

这样的 lambda。

这种层次结构定义在 python 项目中有点不寻常,这就是为什么您很难用日常语法实现它的原因。你应该退后一步,想想你到底对这个架构投入了多少,如果现在以更符合常见 python 习语的方式重写它还为时不晚,也许你应该这样做相反(特别想到"explicit is better than implicit")。

话虽这么说,如果日常python不剪,你可以用奇怪的python来写你想要的东西,而不会有太多麻烦。如果您想详细了解函数如何转换为方法,请考虑阅读 the descriptor protocol


MyFunPackage/worlds/__init__.py

from . import world1, world2

您创建的任何新 world_n.py 文件都需要更新此行。虽然它可以自动动态导入,但它会破坏任何 IDE 的成员提示,并且需要更多狡猾的代码。您确实写过在添加模块时不想更改任何其他内容,但希望将文件名添加到这一行是可以的。

此文件不应包含任何其他代码。

MyFunPackage/worlds/world*.py

def frobulate(self):
    return f'{self.name} has been frobulated' 

无需向 world1.pyworld2.pyworlds 文件夹中的任何新文件添加任何特殊代码。只需在其中编写您认为合适的函数即可。

MyFunPackage/helloworlds.py

from types import MethodType, FunctionType, SimpleNamespace

from . import worlds

_BASE_ATTRIBUTES = {
    '__builtins__', '__cached__', '__doc__', '__file__',
    '__loader__', '__name__', '__package__', '__path__', '__spec__'
}


class Worlds:
    def __init__(self, name):
        self.name = name

        # for all modules in the "worlds" package
        for world_name in dir(worlds):
            if world_name in _BASE_ATTRIBUTES:
                continue  # skip non-packages and
            world = getattr(worlds, world_name)
            function_map = {}

            # collect all functions in them, by
            for func in dir(world):
                if not isinstance(getattr(world, func), FunctionType):
                    continue  # ignoring non-functions, and
                if getattr(world, func).__module__ != world.__name__:
                    continue  # ignoring names that were only imported

                # turn them into methods of the current worlds instance
                function_map[func] = MethodType(getattr(world, func), self)

            # and add them to a new namespace that is named after the module
            setattr(self, world_name, SimpleNamespace(**function_map))

模块添加逻辑是完全动态的,当您向 worlds 添加新文件时不需要以任何方式更新。


将其设置为软件包并安装后,尝试您的示例代码应该可以工作:

>>> from MyFunPackage.helloworld import Worlds
>>> x = Worlds('foo')
>>> x.world1.frobulate()
'foo has been frobulated'

感谢 python,如此故意公开您的内部工作。


切线:动态地向对象添加函数,修补与描述

使用types.MethodType将一个函数变成一个方法,在其上配置所述描述符协议,并将函数的所有权传递给拥有实例。由于多种原因,这比将实例修补到签名中更可取。

我会很快给出一个例子,因为我认为知道这一点很好。我将在这里跳过命名空间,因为它不会改变行为,只会让它更难阅读:

class Foo:
    """An example class that does nothing yet."""
    pass

def bar(self, text: str) -> str:
    """An example function, we will add this to an instance."""
    return f"I am {self} and say {text}."

import inspect
import timeit  
import types
# now the gang's all here!

用 lambda 打补丁

>>> foo = Foo()
>>> foo.bar = lambda *args, **kwargs: bar(foo, *args, **kwargs)
>>> foo.bar('baz')
'I am <__main__.Foo object at 0x000001FB890594E0> and say baz.'
# the behavior is as expected, but ...

>>> foo.bar.__doc__
None
# the doc string is gone
>>> foo.bar.__annotations__
{}
# the type annotations are gone
>>> inspect.signature(foo.bar)
<Signature (*args, **kwargs)>
# the parameters and their names are gone
>>> min(timeit.repeat(
...     "foo.bar('baz')",
...     "from __main__ import foo",
...     number=100000)
... )
0.1211023000000182
# this is how long a single call takes
>>> foo.bar
<function <lambda> at 0x000001FB890594E0>
# as far as it is concerned, it's just some lambda function

简而言之,虽然重现了基本功能,但在此过程中丢失了很多信息。这很可能会成为一个问题,无论是因为你想正确记录你的工作,想使用你的 IDE 的类型提示,还是在调试期间必须通过堆栈跟踪并想要知道到底是哪个函数导致了问题。

虽然做这样的事情来修补测试套件中的依赖关系完全没问题,但这不是您应该在代码库的核心中做的事情。

更改描述符

>>> foo = Foo()
>>> foo.bar = types.MethodType(foo, bar)
>>> foo.bar('baz')
'I am <__main__.Foo object at 0x00000292AE287D68> and say baz.'
# same so far, but ...

>>> foo.bar.__doc__
'An example function, we will add this to an instance.'
# the doc string is still there
>>> foo.bar.__annotations__
{'text': <class 'str'>, 'return': <class 'str'>}
# same as type annotations
>>> inspect.signature(foo.bar)
<Signature (text: str) -> str>
# and the signature is correct, without us needing to do anything
>>> min(timeit.repeat(
...     "foo.bar('baz')",
...     "from __main__ import foo",
...     number=100000)
... )
0.08953189999999722
# execution time is 25% lower due to less overhead, no delegation necessary here
>>> foo.bar
<bound method bar of <__main__.Foo object at 0x00000292AE287D68>>
# and it knows that it's a method and belongs to an instance of Foo

以这种方式将函数绑定为方法可以正确保留所有信息。就 python 而言,它现在与任何其他静态绑定而非动态绑定的方法相同。