PySpark 的 addPyFile 方法使 SparkContext None
PySpark's addPyFile method makes SparkContext None
我一直在努力。在 PySpark shell 中,我将 SparkContext 设为 sc
。但是当我使用 addPyFile
方法时,它使生成的 SparkContext None
:
>>> sc2 = sc.addPyFile("/home/ec2-user/redis.zip")
>>> sc2 is None
True
怎么了?
下面是source code to pyspark's (v1.1.1) addPyFile。 (在我写这篇文章时,官方 pyspark 文档中 1.4.1 的源链接已损坏)
是returns None
,因为没有return
语句。另见:in python ,if a function doesn't have a return statement,what does it return?
所以,如果你 sc2 = sc.addPyFile("mymodule.py")
当然 sc2
会是 None 因为 .addPyFile()
没有 return 任何东西!
相反,只需调用 sc.addPyFile("mymodule.py")
并继续使用 sc
作为 SparkContext
def addPyFile(self, path):
635 """
636 Add a .py or .zip dependency for all tasks to be executed on this
637 SparkContext in the future. The C{path} passed can be either a local
638 file, a file in HDFS (or other Hadoop-supported filesystems), or an
639 HTTP, HTTPS or FTP URI.
640 """
641 self.addFile(path)
642 (dirname, filename) = os.path.split(path) # dirname may be directory or HDFS/S3 prefix
643
644 if filename.endswith('.zip') or filename.endswith('.ZIP') or filename.endswith('.egg'):
645 self._python_includes.append(filename)
646 # for tests in local mode
647 sys.path.append(os.path.join(SparkFiles.getRootDirectory(), filename))
我一直在努力sc
。但是当我使用 addPyFile
方法时,它使生成的 SparkContext None
:
>>> sc2 = sc.addPyFile("/home/ec2-user/redis.zip")
>>> sc2 is None
True
怎么了?
下面是source code to pyspark's (v1.1.1) addPyFile。 (在我写这篇文章时,官方 pyspark 文档中 1.4.1 的源链接已损坏)
是returns None
,因为没有return
语句。另见:in python ,if a function doesn't have a return statement,what does it return?
所以,如果你 sc2 = sc.addPyFile("mymodule.py")
当然 sc2
会是 None 因为 .addPyFile()
没有 return 任何东西!
相反,只需调用 sc.addPyFile("mymodule.py")
并继续使用 sc
作为 SparkContext
def addPyFile(self, path):
635 """
636 Add a .py or .zip dependency for all tasks to be executed on this
637 SparkContext in the future. The C{path} passed can be either a local
638 file, a file in HDFS (or other Hadoop-supported filesystems), or an
639 HTTP, HTTPS or FTP URI.
640 """
641 self.addFile(path)
642 (dirname, filename) = os.path.split(path) # dirname may be directory or HDFS/S3 prefix
643
644 if filename.endswith('.zip') or filename.endswith('.ZIP') or filename.endswith('.egg'):
645 self._python_includes.append(filename)
646 # for tests in local mode
647 sys.path.append(os.path.join(SparkFiles.getRootDirectory(), filename))