为什么在使用 'count' 函数时在 Pyspark 中出现 py4j 错误
Why do I get py4j error in Pyspark when using the 'count' function
我正在尝试 运行 pyspark 中的简单代码,但出现 py4j 错误。
from pyspark import SparkContext
logFile = "file:///home/hadoop/spark-2.1.0-bin-hadoop2.7/README.md"
sc = SparkContext("local", "word count")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
错误是:
An error occurred while calling o75.printStackTrace. Trace:
py4j.Py4JException: Method printStackTrace([class org.apache.spark.api.java.JavaSparkContext]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:835)
我配置了环境变量,还是不行。我什至尝试了 findspark.init() 但没有再次工作。我做错了什么?
我确定环境变量设置不正确。请你post所有的环境变量。我的如下所示,它工作正常
特别检查 SCALA_HOME 和 SPARK_HOME。结束后不应该有"bin"。
我的 windows 环境:
- HADOOP_HOME = C:\spark\hadoop
- JAVA_HOME = C:\Program Files\Java\jdk1.8.0_151
- SCALA_HOME = *C:\spark\scala*
- SPARK_HOME = *C:\spark\spark*
PYSPARK_PYTHON = C:\Users\user\Anaconda3\envs\python.exe
- PYSPARK_DRIVER_PYTHON = C:\Users\user\Anaconda3\envs\Scripts\jupyter.exe
- PYSPARK_DRIVER_PYTHON_OPTS = 笔记本
我正在尝试 运行 pyspark 中的简单代码,但出现 py4j 错误。
from pyspark import SparkContext
logFile = "file:///home/hadoop/spark-2.1.0-bin-hadoop2.7/README.md"
sc = SparkContext("local", "word count")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
错误是:
An error occurred while calling o75.printStackTrace. Trace:
py4j.Py4JException: Method printStackTrace([class org.apache.spark.api.java.JavaSparkContext]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:835)
我配置了环境变量,还是不行。我什至尝试了 findspark.init() 但没有再次工作。我做错了什么?
我确定环境变量设置不正确。请你post所有的环境变量。我的如下所示,它工作正常
特别检查 SCALA_HOME 和 SPARK_HOME。结束后不应该有"bin"。
我的 windows 环境:
- HADOOP_HOME = C:\spark\hadoop
- JAVA_HOME = C:\Program Files\Java\jdk1.8.0_151
- SCALA_HOME = *C:\spark\scala*
- SPARK_HOME = *C:\spark\spark* PYSPARK_PYTHON = C:\Users\user\Anaconda3\envs\python.exe
- PYSPARK_DRIVER_PYTHON = C:\Users\user\Anaconda3\envs\Scripts\jupyter.exe
- PYSPARK_DRIVER_PYTHON_OPTS = 笔记本