区分具有相同字段的 Pydantic 模型

Distinguishing between Pydantic Models with same fields

我正在使用 Pydantic 定义分层数据,其中存在具有相同属性的模型。

但是,当我保存和加载这些模型时,Pydantic 无法再区分使用了哪个模型,而是在字段类型注释中选择了第一个。

我知道这是基于 documentation 的预期行为。 但是,class 类型信息对我的申请很重要。

在 Pydantic 中区分不同 classes 的推荐方法是什么?一种 hack 是简单地向其中一个模型添加一个无关字段,但我想找到一个更优雅的解决方案。

参见下面的简化示例:container 初始化为 DataB 类型的数据,但在导出和加载后,新的 container 具有 DataA 类型的数据因为它是 container.data.

类型声明中的第一个元素

感谢您的帮助!

from abc import ABC
from pydantic import BaseModel #pydantic 1.8.2
from typing import Union

class Data(BaseModel, ABC):
    """ base class for a Member """
    number: float

class DataA(Data):
    """ A type of Data"""
    pass

class DataB(Data):
    """ Another type of Data """
    pass

class Container(BaseModel):
    """ container holds a subclass of Data """
    data: Union[DataA, DataB]

# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)

# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)

# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataA

正如评论中正确指出的那样,不存储额外信息模型在解析时无法区分。

截至今天(pydantic v1.8.2),在 Union 中解析时区分模型的最规范方法(以防歧义)是显式添加类型说明符 Literal。它看起来像这样:

from abc import ABC
from pydantic import BaseModel
from typing import Union, Literal

class Data(BaseModel, ABC):
    """ base class for a Member """
    number: float


class DataA(Data):
    """ A type of Data"""
    tag: Literal['A'] = 'A'


class DataB(Data):
    """ Another type of Data """
    tag: Literal['B'] = 'B'


class Container(BaseModel):
    """ container holds a subclass of Data """
    data: Union[DataA, DataB]


# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)

# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)


# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataB

此方法可以自动执行,但您可以自行负责使用它,因为它打破了静态类型并使用了可能在未来版本中更改的对象:

from pydantic.fields import ModelField

class Data(BaseModel, ABC):
    """ base class for a Member """
    number: float

    def __init_subclass__(cls, **kwargs):
        name = 'tag'
        value = cls.__name__
        annotation = Literal[value]

        tag_field = ModelField.infer(name=name, value=value, annotation=annotation, class_validators=None, config=cls.__config__)
        cls.__fields__[name] = tag_field
        cls.__annotations__[name] = annotation


class DataA(Data):
    """ A type of Data"""
    pass


class DataB(Data):
    """ Another type of Data """
    pass

我正在尝试同时使用自定义验证器来破解一些东西。 基本上 class 装饰器添加了一个 class_name: str 字段,该字段被添加到 json 字符串。验证器然后根据其值查找正确的 subclass。

def register_distinct_subclasses(fields: tuple):
    """ fields is tuple of subclasses that we want to be registered as distinct """

    field_map = {field.__name__: field for field in fields}

    def _register_distinct_subclasses(cls):
        """ cls is the superclass of fields, which we add a new validator to """

        orig_init = cls.__init__

        class _class:
            class_name: str

            def __init__(self, **kwargs):
                class_name = type(self).__name__
                kwargs["class_name"] = class_name
                orig_init(**kwargs)

            @classmethod
            def __get_validators__(cls):
                yield cls.validate

            @classmethod
            def validate(cls, v):
                if isinstance(v, dict):
                    class_name = v.get("class_name")
                    json_string = json.dumps(v)
                else:
                    class_name = v.class_name
                    json_string = v.json()
                cls_type = field_map[class_name]
                return cls_type.parse_raw(json_string)

        return _class
    return _register_distinct_subclasses

调用如下

Data = register_distinct_subclasses((DataA, DataB))(Data)

只是想借此机会在这里列出 pydantic 的另一个可能的替代方案 - 它已经很好地支持这个用例,按照下面的回答。

我是一个相对较新且鲜为人知的 JSON 序列化库的创建者和维护者,Dataclass 向导 - 它依赖于 Python dataclasses模块来施展它的魔力。从最新版本 0.14.0 开始,dataclass-wizard 现在支持联合类型中的数据class。以前,它根本不支持 Union 类型中的数据 classes,这是一个明显的遗漏,并且在我的“待办事项”列表中(最终)添加对.

截至最新,它现在应该支持在 Union 类型中定义数据classes。它以前通常 不起作用的原因是因为被反序列化的数据通常是一个JSON 对象,它只知道简单的类型,例如数组和字典,例如. dict 类型不会匹配任何 Union[Data1, Data2] 类型,即使对象具有所有正确的数据 class 字段作为键。这仅仅是因为它没有将 dict 对象与 Union 类型中的每个数据 class 字段进行比较,尽管这可能会在未来的版本中发生变化。

所以无论如何,这里有一个简单的例子来演示 dataclasses 在 Union 类型中的用法,使用 class 继承模型和 JSONWizard 混合 class:

具有Class继承
from abc import ABC
from dataclasses import dataclass
from typing import Union

from dataclass_wizard import JSONWizard


@dataclass
class Data(ABC):
    """ base class for a Member """
    number: float


class DataA(Data, JSONWizard):
    """ A type of Data"""

    class _(JSONWizard.Meta):
        """
        This defines a custom tag that uniquely identifies the dataclass.
        """
        tag = 'A'


class DataB(Data, JSONWizard):
    """ Another type of Data """

    class _(JSONWizard.Meta):
        """
        This defines a custom tag that uniquely identifies the dataclass.
        """
        tag = 'B'


@dataclass
class Container(JSONWizard):
    """ container holds a subclass of Data """
    data: Union[DataA, DataB]

用法如下所示,同样非常简单。它依赖于在字典或 JSON 对象中设置的特殊 __tag__ 键,根据 class 的 Meta.tag 值将其编组为正确的数据 class ],就是我们上面设置的。

print('== Load with DataA ==')

input_dict = {
    'data': {
        'number': '1.0',
        '__tag__': 'A'
    }
}

# De-serialize the `dict` object to a `Container` instance.
container = Container.from_dict(input_dict)

print(repr(container))
# prints:
#   Container(data=DataA(number=1.0))

# Show the prettified JSON representation of the instance.
print(container)

# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataA

print()

print('== Load with DataB ==')

# initialize container with DataB
data_b = DataB(number=2.0)
container = Container(data=data_b)

print(repr(container))
# prints:
#   Container(data=DataB(number=2.0))

# Show the prettified JSON representation of the instance.
print(container)

# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB

# Assert we end up with the same instance when serializing and de-serializing
# our data.
string = container.to_json()
assert container == Container.from_json(string)
没有Class继承

这里是与上面相同的示例,但是完全依赖 dataclasses,没有使用任何特殊的 class 继承模型:

from abc import ABC
from dataclasses import dataclass
from typing import Union

from dataclass_wizard import asdict, fromdict, LoadMeta


@dataclass
class Data(ABC):
    """ base class for a Member """
    number: float


class DataA(Data):
    """ A type of Data"""


class DataB(Data):
    """ Another type of Data """


@dataclass
class Container:
    """ container holds a subclass of Data """
    data: Union[DataA, DataB]


# Setup tags for the dataclasses. This can be passed into either
# `LoadMeta` or `DumpMeta`.
#
# Note that I'm not a fan of this syntax either, so it might change. I was
# thinking of something more explicit, like `LoadMeta(...).bind_to(class)`
LoadMeta(DataA, tag='A')
LoadMeta(DataB, tag='B')

# The rest is the same as before.

# initialize container with DataB
data = DataB(number=2.0)
container = Container(data=data)

print(repr(container))
# prints:
#   Container(data=DataB(number=2.0))

# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB

# Assert we end up with the same data when serializing and de-serializing.
out_dict = asdict(container)
assert container == fromdict(Container, out_dict)