Python - 数据类:从包含无效名称的字典中加载属性值
Python - Dataclass: load attribute value from a dictionary containing an invalid name
不幸的是,我必须加载一个包含无效名称(我无法更改)的字典:
dict = {..., "invalid-name": 0, ...}
我想将这个字典转换成一个 dataclass
对象,但是我不能用这个名字定义一个属性。
from dataclasses import dataclass
@dataclass
class Dict:
...
invalid-name: int # can't do this
...
我能找到的唯一解决方案是在将字典键转换为 dataclass
对象之前将其更改为有效键:
dict["valid_name"] = dict.pop("invalid-name")
但我想避免使用字符串文字...
有没有更好的解决办法?
以下代码允许过滤不存在的键:
import dataclasses
@dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
但是,我相信应该有更好的方法来做到这一点,因为这有点 hacky。
无论如何,我会定义一个 from_dict
class 方法,这是进行更改的自然位置。
@dataclass
class MyDict:
...
valid_name: int
...
@classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
是否应该就地修改 d
或采取措施避免不必要的复制是另一回事。
一种解决方案是使用 dict-to-dataclass。如其文档中所述,它有两个选项:
1.passing 字典键
It's probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don't, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
@dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
- 自定义转换器
If you need to convert a dictionary value that isn't covered by the defaults, you can pass in a converter function using field_from_dict's converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
@dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
另一种选择是根据需要使用 dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings。
我也用内置的 timeit
模块计时,发现它(平均)比 [=13] 的解决方案快 5x =].我在下面添加了用于比较的代码。
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
@dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
@dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
结果,在我的 Mac OS X 笔记本电脑上:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098
不幸的是,我必须加载一个包含无效名称(我无法更改)的字典:
dict = {..., "invalid-name": 0, ...}
我想将这个字典转换成一个 dataclass
对象,但是我不能用这个名字定义一个属性。
from dataclasses import dataclass
@dataclass
class Dict:
...
invalid-name: int # can't do this
...
我能找到的唯一解决方案是在将字典键转换为 dataclass
对象之前将其更改为有效键:
dict["valid_name"] = dict.pop("invalid-name")
但我想避免使用字符串文字...
有没有更好的解决办法?
以下代码允许过滤不存在的键:
import dataclasses
@dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
但是,我相信应该有更好的方法来做到这一点,因为这有点 hacky。
无论如何,我会定义一个 from_dict
class 方法,这是进行更改的自然位置。
@dataclass
class MyDict:
...
valid_name: int
...
@classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
是否应该就地修改 d
或采取措施避免不必要的复制是另一回事。
一种解决方案是使用 dict-to-dataclass。如其文档中所述,它有两个选项:
1.passing 字典键
It's probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don't, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
@dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
- 自定义转换器
If you need to convert a dictionary value that isn't covered by the defaults, you can pass in a converter function using field_from_dict's converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
@dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
另一种选择是根据需要使用 dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings。
我也用内置的 timeit
模块计时,发现它(平均)比 [=13] 的解决方案快 5x =].我在下面添加了用于比较的代码。
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
@dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
@dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
结果,在我的 Mac OS X 笔记本电脑上:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098