"ERROR - 'NoneType' object has no attribute 'axes'" 尝试从 s3 字节对象读取 pickle 文件时

"ERROR - 'NoneType' object has no attribute 'axes'" when trying to read pickle file from s3 bytes object

我运行在 Apache 气流环境中使用以下代码从 s3 获取 pickle 文件并将其读入内存。我一尝试 read/print 文件内容,就收到错误消息:

ERROR - 'NoneType' object has no attribute 'axes'


代码

import boto3
import pickle


# [...Omitted code...]  

s3_session = boto3.Session(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key
)

s3 = s3_session.resource('s3')
obj = s3.Object(bucket_name, KEY)
pickle_contents = obj.get()['Body'].read()
body = pickle.loads(pickle_contents)

print(body)

# ^-- This is where the error happens, as soon as I try to read it. 

这段代码实际上似乎在单独的 Jupyter notebook 实例上运行良好,这导致我猜测版本不兼容问题?泡菜文件看起来像下面的字典,感谢我的 Jupyter notebook 让我 print(body):

泡菜文件正文:

{75: 
  'recommendation_diversity_metrics': 
    {'largest_subcategory_group_proportion': 
      {'mean': 0.3369472,
       'sd': 0.1741708739837092,
       'min': 0.05333333333333334,
       'max': 1.0},
     'catalogue_entropy': 3.4412171579585533,
     'subcategory_overweight_frequency': 
        School & Office Supplies    0.73020
        Pants                       0.70656
        Bedding                     0.64138
        Sweaters                    0.62616
        Tops                        0.57044
                                     ...   
        Cleanup & Odor Control      0.00144
        UNKNOWN                     0.00036
        Body Piercings              0.00034
        Misc Books                  0.00012
        Home Books                  0.00012
        Length: 94, dtype: float64},
  'recommendation_novelty_metrics': {
    'previously_interacted': {'mean': 0.052456533333333326,
      'sd': 0.06291214458333363,
      'min': 0.0,
      'max': 0.6},
    'new_product_frequency': {'mean': 0.016672799999999998,
      'sd': 0.01423356021834222,
      'min': 0.0,
      'max': 0.12}
      }}

我认为发生错误是因为我在字典中有一个 pandas 系列对象(参见上面字典中的 subcategory_overweight_frequency)。 因为只要我只读取除那个特定元素之外的所有字典元素,那么解释器就会让我的代码 运行 正常。我是否遗漏了一个我不知道的依赖项?


完整追溯

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 655, in __repr__
    show_dimensions=show_dimensions,
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 774, in to_string
    line_width=line_width,
  File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 484, in __init__
    self.max_rows_displayed = min(max_rows or len(self.frame), len(self.frame))
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 996, in __len__
    return len(self.index)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 5175, in __getattr__
    return object.__getattribute__(self, name)
  File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
AttributeError: 'NoneType' object has no attribute 'axes'

您可能已经使用更高版本的 Pandas 腌制了 DataFrame,并且可能正在尝试使用早期版本读取腌制文件。

请验证您用来 pickle DataFrame 的版本以及您与 Airflow 一起使用的 Pandas 版本。