当指定存储级别时,在 pyspark2 中保留数据框不起作用。我究竟做错了什么?
Persisting a data frame in pyspark2 does not work when a storage level is specified. What am I doing wrong?
我试图在执行连接之前保留两个非常大的数据帧以解决 "java.util.concurrent.TimeoutException: Futures timed out..." 问题(参考:)。
Persist() 单独工作,但当我尝试指定存储级别时,我收到名称错误。
我试过以下方法:
df.persist(pyspark.StorageLevel.MEMORY_ONLY)
NameError: name 'MEMORY_ONLY' is not defined
df.persist(StorageLevel.MEMORY_ONLY)
NameError: name 'StorageLevel' is not defined
import org.apache.spark.storage.StorageLevel
ImportError: No module named org.apache.spark.storage.StorageLevel
如有任何帮助,我们将不胜感激。
您必须导入适当的包:-
from pyspark import StorageLevel
导入pyspark包
import pyspark
以下对我有用:
from pyspark.storagelevel import StorageLevel
df.persist(StorageLevel.MEMORY_ONLY)
我试图在执行连接之前保留两个非常大的数据帧以解决 "java.util.concurrent.TimeoutException: Futures timed out..." 问题(参考:
Persist() 单独工作,但当我尝试指定存储级别时,我收到名称错误。
我试过以下方法:
df.persist(pyspark.StorageLevel.MEMORY_ONLY)
NameError: name 'MEMORY_ONLY' is not defined
df.persist(StorageLevel.MEMORY_ONLY)
NameError: name 'StorageLevel' is not defined
import org.apache.spark.storage.StorageLevel
ImportError: No module named org.apache.spark.storage.StorageLevel
如有任何帮助,我们将不胜感激。
您必须导入适当的包:-
from pyspark import StorageLevel
导入pyspark包
import pyspark
以下对我有用:
from pyspark.storagelevel import StorageLevel
df.persist(StorageLevel.MEMORY_ONLY)