Python 内置最大函数抛出 'dict' 对象没有列表类型对象的属性 'max'

Question

我有以下代码行作为我的 pyspark 管道的一部分（该硬编码列表是从配置文件中获取的）并且我是运行我在 EMR 中的管道。下面是 EMR Bootstrap 图片。内置函数将 int 列表视为 dict 并抛出以下错误。知道为什么我会看到这种奇怪的行为吗？

    max_n_days = __builtins__.max([10, 20])

EMR Bootstrap:

 #!/bin/bash

sudo easy_install pip
sudo yum install -y tkinter tk-devel
sudo yum install -y python3-pip
sudo pip install boto3
sudo pip install configparser
sudo pip install paramiko
sudo pip install nltk
sudo pip install scipy
sudo pip install scikit-learn
sudo pip install pandas==0.24.2
sudo pip install -U keras
sudo pip install pyddq
sudo pip install torch
sudo pip install numpy 
sudo pip install future
sudo pip install keras==2.2.4
sudo pip install PyArrow==0.15.1
sudo pip install --upgrade awscli

错误：

max_n_days = __builtins__.max([10, 20])  # use buildins reference
AttributeError: 'dict' object has no attribute 'max'
None

注意：我正在使用“builtins.max()”，因为 'max' 与 sql max 函数发生冲突。我在 emr 上使用 python 2.7，也尝试过 'import builtins' 但我没有找到内置函数。根据其他堆栈溢出中的建议 post 我已经在我的集群上安装了“future”，但运气不好，仍然找不到内置函数。

Answer 1

不要做 from pyspark.sql.functions import *。这会覆盖 sum、max、min、round 等 Python 的内置函数，你会后悔的。

始终使用 import pyspark.sql.functions as F（或您最喜欢的别名），并使用 F.sum、F.max、F.min 等调用 Spark 函数

Answer 2

来自 docs（强调我的）：

By default, when in the __main__ module, __builtins__ is the built-in module builtins; when in any other module, __builtins__ is an alias for the dictionary of the builtins module itself.

以上解释了为什么您将 __builtins__ 视为 Pyspark 作业中的字典。

也来自相同的文档：

CPython implementation detail: Users should not touch __builtins__; it is strictly an implementation detail. Users wanting to override values in the builtins namespace should import the builtins module and modify its attributes appropriately.

解决方案：使用 import builtins 后跟 builtins.max()。

Python 内置最大函数抛出 'dict' 对象没有列表类型对象的属性 'max'

Python builtin max function throwing 'dict' object has no attribute 'max' for list type object

python

built-in

apache-spark

pyspark