如何从 PySpark 中的 JavaSparkContext 获取 SparkContext?
How to get SparkContext from JavaSparkContext in PySpark?
当我运行 PySpark,执行
sc._gateway.help(sc._jsc)
成功给我一些不错的输出,比如
JavaSparkContext extends org.apache.spark.api.java.JavaSparkContextVarargsWorkaround implements java.io.Closeable {
|
| Methods defined here:
|
| accumulable(Object, String, AccumulableParam) : Accumulable
|
| accumulable(Object, AccumulableParam) : Accumulable
|
| accumulator(double, String) : Accumulator
|
| accumulator(Object, AccumulatorParam) : Accumulator
|
| accumulator(Object, String, AccumulatorParam) : Accumulator
|
| accumulator(double) : Accumulator
...
while 运行宁
sc._gateway.help(sc._jsc.sc())
给我一个带有 Java NPE
的 Py4J 错误
Py4JError: An error occurred while calling None.None. Trace:
java.lang.NullPointerException
at py4j.model.Py4JMember.compareTo(Py4JMember.java:54)
at py4j.model.Py4JMember.compareTo(Py4JMember.java:39)
at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:290)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:157)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
at java.util.Arrays.sort(Arrays.java:472)
at java.util.Collections.sort(Collections.java:155)
at py4j.model.Py4JClass.buildClass(Py4JClass.java:88)
at py4j.commands.HelpPageCommand.getHelpObject(HelpPageCommand.java:118)
at py4j.commands.HelpPageCommand.execute(HelpPageCommand.java:74)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)
为什么我无法通过 Py4J 访问我有权访问的 JavaSparkContext
中包含的 SparkContext
?
sc._jsc.sc()
是访问底层 SparkContext 的正确方法。举例说明:
>>> sc._jsc.sc()
JavaObject id=o27
>>> sc._jsc.sc().version()
u'1.1.0'
>>> sc._jsc.sc().defaultMinSplits()
2
您在这里看到的问题是 Py4J 的 help
命令无法显示此 class 的帮助(可能是 Py4J 错误)。
当我运行 PySpark,执行
sc._gateway.help(sc._jsc)
成功给我一些不错的输出,比如
JavaSparkContext extends org.apache.spark.api.java.JavaSparkContextVarargsWorkaround implements java.io.Closeable {
|
| Methods defined here:
|
| accumulable(Object, String, AccumulableParam) : Accumulable
|
| accumulable(Object, AccumulableParam) : Accumulable
|
| accumulator(double, String) : Accumulator
|
| accumulator(Object, AccumulatorParam) : Accumulator
|
| accumulator(Object, String, AccumulatorParam) : Accumulator
|
| accumulator(double) : Accumulator
...
while 运行宁
sc._gateway.help(sc._jsc.sc())
给我一个带有 Java NPE
的 Py4J 错误Py4JError: An error occurred while calling None.None. Trace:
java.lang.NullPointerException
at py4j.model.Py4JMember.compareTo(Py4JMember.java:54)
at py4j.model.Py4JMember.compareTo(Py4JMember.java:39)
at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:290)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:157)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
at java.util.Arrays.sort(Arrays.java:472)
at java.util.Collections.sort(Collections.java:155)
at py4j.model.Py4JClass.buildClass(Py4JClass.java:88)
at py4j.commands.HelpPageCommand.getHelpObject(HelpPageCommand.java:118)
at py4j.commands.HelpPageCommand.execute(HelpPageCommand.java:74)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)
为什么我无法通过 Py4J 访问我有权访问的 JavaSparkContext
中包含的 SparkContext
?
sc._jsc.sc()
是访问底层 SparkContext 的正确方法。举例说明:
>>> sc._jsc.sc()
JavaObject id=o27
>>> sc._jsc.sc().version()
u'1.1.0'
>>> sc._jsc.sc().defaultMinSplits()
2
您在这里看到的问题是 Py4J 的 help
命令无法显示此 class 的帮助(可能是 Py4J 错误)。