在 MongoEngine 文档构造函数中正确调用 __set__

Correctly Call __set__ in MongoEngine Document constructor

我打算使用将 pandas 数据帧强制转换为 [=41] 的 Python MongoEngine framework; coercing Pandas Dataframes to a Python Dict via df.to_list() and storing them as a nested Document attribute. I'm attempting to minimize the amount of code I have to write to make the round trip from Pandas DataFrame to BSON and back by using a custom field type called DataFrameField which is defined in this gist 将 Pandas 数据帧存储在 MongoDB 中=] dict 并返回 __set____get__ 方法。

这在使用点表示法设置 DataFrameField 时效果很好,如:

import pandas as pd
import numpy as np
from mongoengine import *

a_pandas_data_frame = pd.DataFrame({
    'goods': ['a', 'a', 'b', 'b', 'b'],
    'stock': [5, 10, 30, 40, 10],
    'category': ['c1', 'c2', 'c1', 'c2', 'c1'],
    'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09'])
})

class my_data(Document):
        data_frame = DataFrameField() # defined in the referenced gist

foo = my_data()
foo.data_frame = a_pandas_data_frame

但是如果我将 a_pandas_data_frame 传递给构造函数,我得到:

>>> bar = my_data(data_frame = a_pandas_data_frame)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 116, in __init__
    setattr(self, key, value)
  File "C:\Users\MPGWRK-006\Anaconda2\lib\site-packages\mongoengine\base\document.py", line 186, in __setattr__
    super(BaseDocument, self).__setattr__(name, value)
  File "<stdin>", line 18, in __set__
ValueError: value is not a pandas.DataFrame instance

如果我将 print value 之类的打印语句添加到 __set__ 方法,并调用构造函数,它会打印:

['category', 'date', 'goods', 'stock']

这是数据框的列名列表(即list(a_pandas_data_frame.columns))。有什么方法可以防止 MongoEngine 文档构造函数传递传递给 __set__ 方法的对象以外的东西吗?

谢谢!

PS,我也在 [MongoEngine Repo] (https://github.com/MongoEngine/mongoengine/issues/1597) 上问过这个问题,但是大约有 300 个未解决的问题,所以我不确定我是否希望在那个论坛上得到回应很快...

通过源代码挖掘,您似乎需要在 DataFrameField 字段上定义 to_python 方法,否则它将退回到 mongoengine.fields.DictFieldto_python 方法。

mongoengine.fields.DictFieldto_python方法基本上是ComplexBaseFieldto_python method. This method on receiving a DataFrame decides that the object is sort of a list and returns the values通过枚举DataFrame实例获得的。

这里是调用 to_python on the field object 的部分。

if key in self._fields or key in ('id', 'pk', '_cls'):
    if __auto_convert and value is not None:
        field = self._fields.get(key)
        if field and not isinstance(field, FileField):
            value = field.to_python(value)

因此,对于您的情况,您可以简单地将其定义为:

def to_python(self, value):
    return value