运行纱线上的 spark 机器学习示例失败

run spark machine learning example on yarn failed

启动dfs、yarn、spark后，我运行master主机上spark根目录下的这些代码：

MASTER=yarn ./bin/run-example ml.LogisticRegressionExample \ data/mllib/sample_libsvm_data.txt

实际上我是从 Spark 的自述文件中获取这些代码的，这里是 GitHub 上关于 LogisticRegressionExample 的源代码：https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala

然后出现错误：

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;

首先，我不知道为什么是hdfs://master:9000/user/root，我确实将namenode的IP地址设置为hdfs://master:9000，但为什么spark选择了/user/root？

然后，我在集群的每台主机上都创建了一个目录/user/root/data/mllib/sample_libsvm_data.txt，希望spark能够找到这个文件。但是同样的错误又出现了。请告诉我如何解决它。

Spark 正在 HDFS 上查找文件，而不是常规 Linux 文件系统。您为数据 (data/mllib/sample_libsvm_data.txt) 提供的路径是相对路径。在 HDFS 中，假定相对路径从您的主目录开始。

github 上的 LogRegExample.scala 假设是本地执行，而不是 yarn 执行。如果要执行 yarn 执行，则需要将文件上传到 HDFS。

运行纱线上的 spark 机器学习示例失败

run spark machine learning example on yarn failed

hadoop

hadoop-yarn

apache-spark

运行 纱线上的 spark 机器学习示例失败

run spark machine learning example on yarn failed

hadoop

hadoop-yarn

apache-spark

运行纱线上的 spark 机器学习示例失败