为什么 YARN 上的 Spark 给我 'User not found' 而不是错误?
Why is Spark on YARN giving me 'User not found' not error?
当我在 Kerberised YARN 集群上提交 spark-shell 时。 YARN 容器立即退出,YARN 应用程序历史中的 Diagnositc 显示:
Application application_1515782018863_0007 failed 2 times due to AM Container for appattempt_1515782018863_0007_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ip-172-31-11-83.us-west-2.compute.internal:8088/cluster/app/application_1515782018863_0007Then, click on links to logs of each attempt.
Diagnostics: Application application_1515782018863_0007 initialization failed (exitCode=255) with output: **User datapass not found**
Failing this attempt. Failing the application.
问题是用户 'datapass' 存在并且在集群中配置良好:
kadmin.local: listprincs
...
datapass/COMPUTE.INTERNAL@DATAPASSPORT.INTERNAL
datapass/DATAPASSPORT.INTERNAL@COMPUTE.INTERNAL
datapass@COMPUTE.INTERNAL
datapass@DATAPASSPORT.INTERNAL
...
$ hadoop fs -ls /user
Found 9 items
drwxr-xr-x - datapass datapass 0 2018-01-13 02:11 /user/datapass
drwxrwxrwx - hadoop hadoop 0 2018-01-12 18:50 /user/hadoop
drwxr-xr-x - mapred mapred 0 2018-01-12 18:33 /user/history
drwxrwxrwx - hdfs hadoop 0 2018-01-12 18:33 /user/hive
drwxrwxrwx - hue hue 0 2018-01-12 18:33 /user/hue
drwxrwxrwx - livy livy 0 2018-01-12 18:33 /user/livy
drwxrwxrwx - oozie oozie 0 2018-01-12 18:38 /user/oozie
drwxrwxrwx - root hadoop 0 2018-01-12 18:33 /user/root
drwxrwxrwx - spark spark 0 2018-01-12 18:33 /user/spark
那为什么 YARN 给我这个虚假信息?
找到原因:
必须在所有hadoop节点的操作系统上创建用户'datapass',否则会触发这个奇怪的错误。
当我在 Kerberised YARN 集群上提交 spark-shell 时。 YARN 容器立即退出,YARN 应用程序历史中的 Diagnositc 显示:
Application application_1515782018863_0007 failed 2 times due to AM Container for appattempt_1515782018863_0007_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ip-172-31-11-83.us-west-2.compute.internal:8088/cluster/app/application_1515782018863_0007Then, click on links to logs of each attempt.
Diagnostics: Application application_1515782018863_0007 initialization failed (exitCode=255) with output: **User datapass not found**
Failing this attempt. Failing the application.
问题是用户 'datapass' 存在并且在集群中配置良好:
kadmin.local: listprincs
...
datapass/COMPUTE.INTERNAL@DATAPASSPORT.INTERNAL
datapass/DATAPASSPORT.INTERNAL@COMPUTE.INTERNAL
datapass@COMPUTE.INTERNAL
datapass@DATAPASSPORT.INTERNAL
...
$ hadoop fs -ls /user
Found 9 items
drwxr-xr-x - datapass datapass 0 2018-01-13 02:11 /user/datapass
drwxrwxrwx - hadoop hadoop 0 2018-01-12 18:50 /user/hadoop
drwxr-xr-x - mapred mapred 0 2018-01-12 18:33 /user/history
drwxrwxrwx - hdfs hadoop 0 2018-01-12 18:33 /user/hive
drwxrwxrwx - hue hue 0 2018-01-12 18:33 /user/hue
drwxrwxrwx - livy livy 0 2018-01-12 18:33 /user/livy
drwxrwxrwx - oozie oozie 0 2018-01-12 18:38 /user/oozie
drwxrwxrwx - root hadoop 0 2018-01-12 18:33 /user/root
drwxrwxrwx - spark spark 0 2018-01-12 18:33 /user/spark
那为什么 YARN 给我这个虚假信息?
找到原因:
必须在所有hadoop节点的操作系统上创建用户'datapass',否则会触发这个奇怪的错误。