区分具有相同字段的 Pydantic 模型
Distinguishing between Pydantic Models with same fields
我正在使用 Pydantic 定义分层数据,其中存在具有相同属性的模型。
但是,当我保存和加载这些模型时,Pydantic 无法再区分使用了哪个模型,而是在字段类型注释中选择了第一个。
我知道这是基于 documentation 的预期行为。
但是,class 类型信息对我的申请很重要。
在 Pydantic 中区分不同 classes 的推荐方法是什么?一种 hack 是简单地向其中一个模型添加一个无关字段,但我想找到一个更优雅的解决方案。
参见下面的简化示例:container
初始化为 DataB
类型的数据,但在导出和加载后,新的 container
具有 DataA
类型的数据因为它是 container.data
.
类型声明中的第一个元素
感谢您的帮助!
from abc import ABC
from pydantic import BaseModel #pydantic 1.8.2
from typing import Union
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
pass
class DataB(Data):
""" Another type of Data """
pass
class Container(BaseModel):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)
# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)
# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataA
正如评论中正确指出的那样,不存储额外信息模型在解析时无法区分。
截至今天(pydantic v1.8.2),在 Union
中解析时区分模型的最规范方法(以防歧义)是显式添加类型说明符 Literal
。它看起来像这样:
from abc import ABC
from pydantic import BaseModel
from typing import Union, Literal
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
tag: Literal['A'] = 'A'
class DataB(Data):
""" Another type of Data """
tag: Literal['B'] = 'B'
class Container(BaseModel):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)
# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)
# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataB
此方法可以自动执行,但您可以自行负责使用它,因为它打破了静态类型并使用了可能在未来版本中更改的对象:
from pydantic.fields import ModelField
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
def __init_subclass__(cls, **kwargs):
name = 'tag'
value = cls.__name__
annotation = Literal[value]
tag_field = ModelField.infer(name=name, value=value, annotation=annotation, class_validators=None, config=cls.__config__)
cls.__fields__[name] = tag_field
cls.__annotations__[name] = annotation
class DataA(Data):
""" A type of Data"""
pass
class DataB(Data):
""" Another type of Data """
pass
我正在尝试同时使用自定义验证器来破解一些东西。
基本上 class 装饰器添加了一个 class_name: str
字段,该字段被添加到 json 字符串。验证器然后根据其值查找正确的 subclass。
def register_distinct_subclasses(fields: tuple):
""" fields is tuple of subclasses that we want to be registered as distinct """
field_map = {field.__name__: field for field in fields}
def _register_distinct_subclasses(cls):
""" cls is the superclass of fields, which we add a new validator to """
orig_init = cls.__init__
class _class:
class_name: str
def __init__(self, **kwargs):
class_name = type(self).__name__
kwargs["class_name"] = class_name
orig_init(**kwargs)
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v):
if isinstance(v, dict):
class_name = v.get("class_name")
json_string = json.dumps(v)
else:
class_name = v.class_name
json_string = v.json()
cls_type = field_map[class_name]
return cls_type.parse_raw(json_string)
return _class
return _register_distinct_subclasses
调用如下
Data = register_distinct_subclasses((DataA, DataB))(Data)
只是想借此机会在这里列出 pydantic
的另一个可能的替代方案 - 它已经很好地支持这个用例,按照下面的回答。
我是一个相对较新且鲜为人知的 JSON 序列化库的创建者和维护者,Dataclass 向导 - 它依赖于 Python dataclasses
模块来施展它的魔力。从最新版本 0.14.0 开始,dataclass-wizard
现在支持联合类型中的数据class。以前,它根本不支持 Union
类型中的数据 classes,这是一个明显的遗漏,并且在我的“待办事项”列表中(最终)添加对.
截至最新,它现在应该支持在 Union
类型中定义数据classes。它以前通常 不起作用的原因是因为被反序列化的数据通常是一个JSON 对象,它只知道简单的类型,例如数组和字典,例如. dict
类型不会匹配任何 Union[Data1, Data2]
类型,即使对象具有所有正确的数据 class 字段作为键。这仅仅是因为它没有将 dict
对象与 Union
类型中的每个数据 class 字段进行比较,尽管这可能会在未来的版本中发生变化。
所以无论如何,这里有一个简单的例子来演示 dataclasses 在 Union
类型中的用法,使用 class 继承模型和 JSONWizard
混合 class:
具有Class继承
from abc import ABC
from dataclasses import dataclass
from typing import Union
from dataclass_wizard import JSONWizard
@dataclass
class Data(ABC):
""" base class for a Member """
number: float
class DataA(Data, JSONWizard):
""" A type of Data"""
class _(JSONWizard.Meta):
"""
This defines a custom tag that uniquely identifies the dataclass.
"""
tag = 'A'
class DataB(Data, JSONWizard):
""" Another type of Data """
class _(JSONWizard.Meta):
"""
This defines a custom tag that uniquely identifies the dataclass.
"""
tag = 'B'
@dataclass
class Container(JSONWizard):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
用法如下所示,同样非常简单。它依赖于在字典或 JSON 对象中设置的特殊 __tag__
键,根据 class 的 Meta.tag
值将其编组为正确的数据 class ],就是我们上面设置的。
print('== Load with DataA ==')
input_dict = {
'data': {
'number': '1.0',
'__tag__': 'A'
}
}
# De-serialize the `dict` object to a `Container` instance.
container = Container.from_dict(input_dict)
print(repr(container))
# prints:
# Container(data=DataA(number=1.0))
# Show the prettified JSON representation of the instance.
print(container)
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataA
print()
print('== Load with DataB ==')
# initialize container with DataB
data_b = DataB(number=2.0)
container = Container(data=data_b)
print(repr(container))
# prints:
# Container(data=DataB(number=2.0))
# Show the prettified JSON representation of the instance.
print(container)
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB
# Assert we end up with the same instance when serializing and de-serializing
# our data.
string = container.to_json()
assert container == Container.from_json(string)
没有Class继承
这里是与上面相同的示例,但是完全依赖 dataclasses
,没有使用任何特殊的 class 继承模型:
from abc import ABC
from dataclasses import dataclass
from typing import Union
from dataclass_wizard import asdict, fromdict, LoadMeta
@dataclass
class Data(ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
class DataB(Data):
""" Another type of Data """
@dataclass
class Container:
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# Setup tags for the dataclasses. This can be passed into either
# `LoadMeta` or `DumpMeta`.
#
# Note that I'm not a fan of this syntax either, so it might change. I was
# thinking of something more explicit, like `LoadMeta(...).bind_to(class)`
LoadMeta(DataA, tag='A')
LoadMeta(DataB, tag='B')
# The rest is the same as before.
# initialize container with DataB
data = DataB(number=2.0)
container = Container(data=data)
print(repr(container))
# prints:
# Container(data=DataB(number=2.0))
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB
# Assert we end up with the same data when serializing and de-serializing.
out_dict = asdict(container)
assert container == fromdict(Container, out_dict)
我正在使用 Pydantic 定义分层数据,其中存在具有相同属性的模型。
但是,当我保存和加载这些模型时,Pydantic 无法再区分使用了哪个模型,而是在字段类型注释中选择了第一个。
我知道这是基于 documentation 的预期行为。 但是,class 类型信息对我的申请很重要。
在 Pydantic 中区分不同 classes 的推荐方法是什么?一种 hack 是简单地向其中一个模型添加一个无关字段,但我想找到一个更优雅的解决方案。
参见下面的简化示例:container
初始化为 DataB
类型的数据,但在导出和加载后,新的 container
具有 DataA
类型的数据因为它是 container.data
.
感谢您的帮助!
from abc import ABC
from pydantic import BaseModel #pydantic 1.8.2
from typing import Union
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
pass
class DataB(Data):
""" Another type of Data """
pass
class Container(BaseModel):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)
# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)
# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataA
正如评论中正确指出的那样,不存储额外信息模型在解析时无法区分。
截至今天(pydantic v1.8.2),在 Union
中解析时区分模型的最规范方法(以防歧义)是显式添加类型说明符 Literal
。它看起来像这样:
from abc import ABC
from pydantic import BaseModel
from typing import Union, Literal
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
tag: Literal['A'] = 'A'
class DataB(Data):
""" Another type of Data """
tag: Literal['B'] = 'B'
class Container(BaseModel):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)
# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)
# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataB
此方法可以自动执行,但您可以自行负责使用它,因为它打破了静态类型并使用了可能在未来版本中更改的对象:
from pydantic.fields import ModelField
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
def __init_subclass__(cls, **kwargs):
name = 'tag'
value = cls.__name__
annotation = Literal[value]
tag_field = ModelField.infer(name=name, value=value, annotation=annotation, class_validators=None, config=cls.__config__)
cls.__fields__[name] = tag_field
cls.__annotations__[name] = annotation
class DataA(Data):
""" A type of Data"""
pass
class DataB(Data):
""" Another type of Data """
pass
我正在尝试同时使用自定义验证器来破解一些东西。
基本上 class 装饰器添加了一个 class_name: str
字段,该字段被添加到 json 字符串。验证器然后根据其值查找正确的 subclass。
def register_distinct_subclasses(fields: tuple):
""" fields is tuple of subclasses that we want to be registered as distinct """
field_map = {field.__name__: field for field in fields}
def _register_distinct_subclasses(cls):
""" cls is the superclass of fields, which we add a new validator to """
orig_init = cls.__init__
class _class:
class_name: str
def __init__(self, **kwargs):
class_name = type(self).__name__
kwargs["class_name"] = class_name
orig_init(**kwargs)
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, v):
if isinstance(v, dict):
class_name = v.get("class_name")
json_string = json.dumps(v)
else:
class_name = v.class_name
json_string = v.json()
cls_type = field_map[class_name]
return cls_type.parse_raw(json_string)
return _class
return _register_distinct_subclasses
调用如下
Data = register_distinct_subclasses((DataA, DataB))(Data)
只是想借此机会在这里列出 pydantic
的另一个可能的替代方案 - 它已经很好地支持这个用例,按照下面的回答。
我是一个相对较新且鲜为人知的 JSON 序列化库的创建者和维护者,Dataclass 向导 - 它依赖于 Python dataclasses
模块来施展它的魔力。从最新版本 0.14.0 开始,dataclass-wizard
现在支持联合类型中的数据class。以前,它根本不支持 Union
类型中的数据 classes,这是一个明显的遗漏,并且在我的“待办事项”列表中(最终)添加对.
截至最新,它现在应该支持在 Union
类型中定义数据classes。它以前通常 不起作用的原因是因为被反序列化的数据通常是一个JSON 对象,它只知道简单的类型,例如数组和字典,例如. dict
类型不会匹配任何 Union[Data1, Data2]
类型,即使对象具有所有正确的数据 class 字段作为键。这仅仅是因为它没有将 dict
对象与 Union
类型中的每个数据 class 字段进行比较,尽管这可能会在未来的版本中发生变化。
所以无论如何,这里有一个简单的例子来演示 dataclasses 在 Union
类型中的用法,使用 class 继承模型和 JSONWizard
混合 class:
具有Class继承
from abc import ABC
from dataclasses import dataclass
from typing import Union
from dataclass_wizard import JSONWizard
@dataclass
class Data(ABC):
""" base class for a Member """
number: float
class DataA(Data, JSONWizard):
""" A type of Data"""
class _(JSONWizard.Meta):
"""
This defines a custom tag that uniquely identifies the dataclass.
"""
tag = 'A'
class DataB(Data, JSONWizard):
""" Another type of Data """
class _(JSONWizard.Meta):
"""
This defines a custom tag that uniquely identifies the dataclass.
"""
tag = 'B'
@dataclass
class Container(JSONWizard):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
用法如下所示,同样非常简单。它依赖于在字典或 JSON 对象中设置的特殊 __tag__
键,根据 class 的 Meta.tag
值将其编组为正确的数据 class ],就是我们上面设置的。
print('== Load with DataA ==')
input_dict = {
'data': {
'number': '1.0',
'__tag__': 'A'
}
}
# De-serialize the `dict` object to a `Container` instance.
container = Container.from_dict(input_dict)
print(repr(container))
# prints:
# Container(data=DataA(number=1.0))
# Show the prettified JSON representation of the instance.
print(container)
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataA
print()
print('== Load with DataB ==')
# initialize container with DataB
data_b = DataB(number=2.0)
container = Container(data=data_b)
print(repr(container))
# prints:
# Container(data=DataB(number=2.0))
# Show the prettified JSON representation of the instance.
print(container)
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB
# Assert we end up with the same instance when serializing and de-serializing
# our data.
string = container.to_json()
assert container == Container.from_json(string)
没有Class继承
这里是与上面相同的示例,但是完全依赖 dataclasses
,没有使用任何特殊的 class 继承模型:
from abc import ABC
from dataclasses import dataclass
from typing import Union
from dataclass_wizard import asdict, fromdict, LoadMeta
@dataclass
class Data(ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
class DataB(Data):
""" Another type of Data """
@dataclass
class Container:
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# Setup tags for the dataclasses. This can be passed into either
# `LoadMeta` or `DumpMeta`.
#
# Note that I'm not a fan of this syntax either, so it might change. I was
# thinking of something more explicit, like `LoadMeta(...).bind_to(class)`
LoadMeta(DataA, tag='A')
LoadMeta(DataB, tag='B')
# The rest is the same as before.
# initialize container with DataB
data = DataB(number=2.0)
container = Container(data=data)
print(repr(container))
# prints:
# Container(data=DataB(number=2.0))
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB
# Assert we end up with the same data when serializing and de-serializing.
out_dict = asdict(container)
assert container == fromdict(Container, out_dict)