Spark on Linux Error: Exception in thread "main" java.io.IOException: Cannot run program "python": error=2, No such file or directory
Spark on Linux Error: Exception in thread "main" java.io.IOException: Cannot run program "python": error=2, No such file or directory
我正在学习 Learning Spark 第二版的第 2 章。当我执行示例 mnmcont.py 脚本时,出现以下错误:
21/02/08 11:40:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python": error=2, No such file or directory
我用来执行脚本的命令是:
$SPARK_HOME/bin/spark-submit mnmcount.py data/mnm_dataset.csv
我在 LearningSparkV2-master/chapter2/py/src 目录中
在我的 bashrc 文件中,我添加了以下行并获取了文件。
SPARK_HOME="/usr/local/spark"
alias python="python3"
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
mnmcount.py 脚本的完整代码如下。
from __future__ import print_function
import sys
from pyspark.sql import SparkSession
from pyspark.sql.functions import count
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: mnmcount <file>", file=sys.stderr)
sys.exit(-1)
spark = (SparkSession
.builder
.appName("PythonMnMCount")
.getOrCreate())
# get the M&M data set file name
mnm_file = sys.argv[1]
# read the file into a Spark DataFrame
mnm_df = (spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load(mnm_file))
mnm_df.show(n=5, truncate=False)
# aggregate count of all colors and groupBy state and color
# orderBy descending order
count_mnm_df = (mnm_df.select("State", "Color", "Count")
.groupBy("State", "Color")
.sum("Count")
.orderBy("sum(Count)", ascending=False))
# show all the resulting aggregation for all the dates and colors
count_mnm_df.show(n=60, truncate=False)
print("Total Rows = %d" % (count_mnm_df.count()))
# find the aggregate count for California by filtering
ca_count_mnm_df = (mnm_df.select("*")
.where(mnm_df.State == 'CA')
.groupBy("State", "Color")
.sum("Count")
.orderBy("sum(Count)", ascending=False))
# show the resulting aggregation for California
ca_count_mnm_df.show(n=10, truncate=False)
添加后
export PYSPARK_PYTHON=python3
到 bashrc 问题已解决。
尝试将您的主人设置为“本地[*]”
我正在学习 Learning Spark 第二版的第 2 章。当我执行示例 mnmcont.py 脚本时,出现以下错误:
21/02/08 11:40:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python": error=2, No such file or directory
我用来执行脚本的命令是:
$SPARK_HOME/bin/spark-submit mnmcount.py data/mnm_dataset.csv
我在 LearningSparkV2-master/chapter2/py/src 目录中
在我的 bashrc 文件中,我添加了以下行并获取了文件。
SPARK_HOME="/usr/local/spark"
alias python="python3"
export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
mnmcount.py 脚本的完整代码如下。
from __future__ import print_function
import sys
from pyspark.sql import SparkSession
from pyspark.sql.functions import count
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: mnmcount <file>", file=sys.stderr)
sys.exit(-1)
spark = (SparkSession
.builder
.appName("PythonMnMCount")
.getOrCreate())
# get the M&M data set file name
mnm_file = sys.argv[1]
# read the file into a Spark DataFrame
mnm_df = (spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load(mnm_file))
mnm_df.show(n=5, truncate=False)
# aggregate count of all colors and groupBy state and color
# orderBy descending order
count_mnm_df = (mnm_df.select("State", "Color", "Count")
.groupBy("State", "Color")
.sum("Count")
.orderBy("sum(Count)", ascending=False))
# show all the resulting aggregation for all the dates and colors
count_mnm_df.show(n=60, truncate=False)
print("Total Rows = %d" % (count_mnm_df.count()))
# find the aggregate count for California by filtering
ca_count_mnm_df = (mnm_df.select("*")
.where(mnm_df.State == 'CA')
.groupBy("State", "Color")
.sum("Count")
.orderBy("sum(Count)", ascending=False))
# show the resulting aggregation for California
ca_count_mnm_df.show(n=10, truncate=False)
添加后
export PYSPARK_PYTHON=python3
到 bashrc 问题已解决。
尝试将您的主人设置为“本地[*]”