Spark-submit 与 yarn master 失败,错误要求在 scala.Predef 处失败

Spark-submit fails with yarn master, error- requirement failed at scala.Predef

我的 Spark 作业失败并出现以下异常,我无法弄清楚缺少什么导致作业失败的要求:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
        at scala.Predef$.require(Predef.scala:221)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$$anonfun$apply.apply(Client.scala:472)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$$anonfun$apply.apply(Client.scala:470)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:470)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:468)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:468)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742)

Spark-提交命令:

spark-submit --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/xyz/conf/log4j.xml \
-DHOME=/xyz/transformation -DENV=e1 \
-DJOB=xformation --conf spark.local.dir=/warehouse/tmp/spark1489619325 \
--queue dev --master yarn --deploy-mode cluster \
--properties-file /xyz/conf/job.conf \
--files /xyz/conf/e1.properties --class TransformationJob /xyz/job.jar

同样的程序在 master 和本地运行良好。

任何建议都会有很大帮助。提前致谢。

我在使用“--jars”元素的类路径中有一个巨大的 jar 列表,其中一个 jar 是罪魁祸首,当我从“--jars”中删除它时,问题得到解决,我仍然不确定为什么由于那个 jar,spark-submit 失败了。

我收到了类似的错误:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
        at scala.Predef$.require(Predef.scala:221)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$$anonfun$apply.apply(Client.scala:501)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$$anonfun$apply.apply(Client.scala:499)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:499)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources.apply(Client.scala:497)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:497)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:763)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1109)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

解决方法是:

在终端或日志上,您将有一个 WARN 行,如下所示:

WARN 客户端:资源文件:多次添加到分布式缓存。

只需从 spark 提交脚本中删除这个额外的 jar。希望这对所有人都有帮助。

我遇到了由数据文件引起的错误。 我指向了错误的列车数据方向。

训练数据和测试数据不匹配。

训练数据:0,1 0 0 1 0 0 1 0 0 1

测试数据:1, 2, 0, 0, 10

我修改了train数据源的方向,问题已经解决