在 MongoEngine 文档构造函数中正确调用 __set__
Correctly Call __set__ in MongoEngine Document constructor
我打算使用将 pandas 数据帧强制转换为 [=41] 的 Python MongoEngine framework; coercing Pandas Dataframes to a Python Dict via df.to_list()
and storing them as a nested Document attribute. I'm attempting to minimize the amount of code I have to write to make the round trip from Pandas DataFrame to BSON and back by using a custom field type called DataFrameField
which is defined in this gist 将 Pandas 数据帧存储在 MongoDB 中=] dict 并返回 __set__
和 __get__
方法。
这在使用点表示法设置 DataFrameField 时效果很好,如:
import pandas as pd
import numpy as np
from mongoengine import *
a_pandas_data_frame = pd.DataFrame({
'goods': ['a', 'a', 'b', 'b', 'b'],
'stock': [5, 10, 30, 40, 10],
'category': ['c1', 'c2', 'c1', 'c2', 'c1'],
'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09'])
})
class my_data(Document):
data_frame = DataFrameField() # defined in the referenced gist
foo = my_data()
foo.data_frame = a_pandas_data_frame
但是如果我将 a_pandas_data_frame
传递给构造函数,我得到:
>>> bar = my_data(data_frame = a_pandas_data_frame)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 116, in __init__
setattr(self, key, value)
File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 186, in __setattr__
super(BaseDocument, self).__setattr__(name, value)
File "<stdin>", line 18, in __set__
ValueError: value is not a pandas.DataFrame instance
如果我将 print value
之类的打印语句添加到 __set__
方法,并调用构造函数,它会打印:
['category', 'date', 'goods', 'stock']
这是数据框的列名列表(即list(a_pandas_data_frame.columns)
)。有什么方法可以防止 MongoEngine 文档构造函数传递传递给 __set__
方法的对象以外的东西吗?
谢谢!
PS,我也在 [MongoEngine Repo] (https://github.com/MongoEngine/mongoengine/issues/1597) 上问过这个问题,但是大约有 300 个未解决的问题,所以我不确定我是否希望在那个论坛上得到回应很快...
通过源代码挖掘,您似乎需要在 DataFrameField
字段上定义 to_python
方法,否则它将退回到 mongoengine.fields.DictField
的 to_python
方法。
mongoengine.fields.DictField
的to_python
方法基本上是ComplexBaseField
的to_python
method. This method on receiving a DataFrame
decides that the object is sort of a list and returns the values通过枚举DataFrame
实例获得的。
这里是调用 to_python
on the field object 的部分。
if key in self._fields or key in ('id', 'pk', '_cls'):
if __auto_convert and value is not None:
field = self._fields.get(key)
if field and not isinstance(field, FileField):
value = field.to_python(value)
因此,对于您的情况,您可以简单地将其定义为:
def to_python(self, value):
return value
我打算使用将 pandas 数据帧强制转换为 [=41] 的 Python MongoEngine framework; coercing Pandas Dataframes to a Python Dict via df.to_list()
and storing them as a nested Document attribute. I'm attempting to minimize the amount of code I have to write to make the round trip from Pandas DataFrame to BSON and back by using a custom field type called DataFrameField
which is defined in this gist 将 Pandas 数据帧存储在 MongoDB 中=] dict 并返回 __set__
和 __get__
方法。
这在使用点表示法设置 DataFrameField 时效果很好,如:
import pandas as pd
import numpy as np
from mongoengine import *
a_pandas_data_frame = pd.DataFrame({
'goods': ['a', 'a', 'b', 'b', 'b'],
'stock': [5, 10, 30, 40, 10],
'category': ['c1', 'c2', 'c1', 'c2', 'c1'],
'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09'])
})
class my_data(Document):
data_frame = DataFrameField() # defined in the referenced gist
foo = my_data()
foo.data_frame = a_pandas_data_frame
但是如果我将 a_pandas_data_frame
传递给构造函数,我得到:
>>> bar = my_data(data_frame = a_pandas_data_frame)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 116, in __init__
setattr(self, key, value)
File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 186, in __setattr__
super(BaseDocument, self).__setattr__(name, value)
File "<stdin>", line 18, in __set__
ValueError: value is not a pandas.DataFrame instance
如果我将 print value
之类的打印语句添加到 __set__
方法,并调用构造函数,它会打印:
['category', 'date', 'goods', 'stock']
这是数据框的列名列表(即list(a_pandas_data_frame.columns)
)。有什么方法可以防止 MongoEngine 文档构造函数传递传递给 __set__
方法的对象以外的东西吗?
谢谢!
PS,我也在 [MongoEngine Repo] (https://github.com/MongoEngine/mongoengine/issues/1597) 上问过这个问题,但是大约有 300 个未解决的问题,所以我不确定我是否希望在那个论坛上得到回应很快...
通过源代码挖掘,您似乎需要在 DataFrameField
字段上定义 to_python
方法,否则它将退回到 mongoengine.fields.DictField
的 to_python
方法。
mongoengine.fields.DictField
的to_python
方法基本上是ComplexBaseField
的to_python
method. This method on receiving a DataFrame
decides that the object is sort of a list and returns the values通过枚举DataFrame
实例获得的。
这里是调用 to_python
on the field object 的部分。
if key in self._fields or key in ('id', 'pk', '_cls'):
if __auto_convert and value is not None:
field = self._fields.get(key)
if field and not isinstance(field, FileField):
value = field.to_python(value)
因此,对于您的情况,您可以简单地将其定义为:
def to_python(self, value):
return value